gitweb.ps.run Git - sporegirl/blob - src/stb_image.h

   1 /* stb_image - v2.30 - public domain image loader - http://nothings.org/stb
   2                                   no warranty implied; use at your own risk
   3
   4    Do this:
   5       #define STB_IMAGE_IMPLEMENTATION
   6    before you include this file in *one* C or C++ file to create the implementation.
   7
   8    // i.e. it should look like this:
   9    #include ...
  10    #include ...
  11    #include ...
  12    #define STB_IMAGE_IMPLEMENTATION
  13    #include "stb_image.h"
  14
  15    You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
  16    And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
  17
  18
  19    QUICK NOTES:
  20       Primarily of interest to game developers and other people who can
  21           avoid problematic images and only need the trivial interface
  22
  23       JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
  24       PNG 1/2/4/8/16-bit-per-channel
  25
  26       TGA (not sure what subset, if a subset)
  27       BMP non-1bpp, non-RLE
  28       PSD (composited view only, no extra channels, 8/16 bit-per-channel)
  29
  30       GIF (*comp always reports as 4-channel)
  31       HDR (radiance rgbE format)
  32       PIC (Softimage PIC)
  33       PNM (PPM and PGM binary only)
  34
  35       Animated GIF still needs a proper API, but here's one way to do it:
  36           http://gist.github.com/urraka/685d9a6340b26b830d49
  37
  38       - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
  39       - decode from arbitrary I/O callbacks
  40       - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
  41
  42    Full documentation under "DOCUMENTATION" below.
  43
  44
  45 LICENSE
  46
  47   See end of file for license information.
  48
  49 RECENT REVISION HISTORY:
  50
  51       2.30  (2024-05-31) avoid erroneous gcc warning
  52       2.29  (2023-05-xx) optimizations
  53       2.28  (2023-01-29) many error fixes, security errors, just tons of stuff
  54       2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
  55       2.26  (2020-07-13) many minor fixes
  56       2.25  (2020-02-02) fix warnings
  57       2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
  58       2.23  (2019-08-11) fix clang static analysis warning
  59       2.22  (2019-03-04) gif fixes, fix warnings
  60       2.21  (2019-02-25) fix typo in comment
  61       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
  62       2.19  (2018-02-11) fix warning
  63       2.18  (2018-01-30) fix warnings
  64       2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
  65       2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
  66       2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
  67       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
  68       2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
  69       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
  70       2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
  71                          RGB-format JPEG; remove white matting in PSD;
  72                          allocate large structures on the stack;
  73                          correct channel count for PNG & BMP
  74       2.10  (2016-01-22) avoid warning introduced in 2.09
  75       2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
  76
  77    See end of file for full revision history.
  78
  79
  80  ============================    Contributors    =========================
  81
  82  Image formats                          Extensions, features
  83     Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
  84     Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
  85     Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
  86     Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
  87     Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
  88     Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
  89     Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
  90     github:urraka (animated gif)           Junggon Kim (PNM comments)
  91     Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
  92                                            socks-the-fox (16-bit PNG)
  93                                            Jeremy Sawicki (handle all ImageNet JPGs)
  94  Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
  95     Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
  96     Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
  97     John-Mark Allen
  98     Carmelo J Fdez-Aguera
  99
 100  Bug & warning fixes
 101     Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
 102     Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
 103     Phil Jordan                                Dave Moore           Roy Eltham
 104     Hayaki Saito            Nathan Reed        Won Chun
 105     Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
 106     Thomas Ruf              Ronny Chevalier                         github:rlyeh
 107     Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
 108     Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
 109     Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
 110     Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
 111     Cass Everitt            Ryamond Barbiero                        github:grim210
 112     Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
 113     Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
 114     Josh Tobin              Neil Bickford      Matthew Gregan       github:poppolopoppo
 115     Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
 116     Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
 117                             Brad Weinberger    Matvey Cherevko      github:mosra
 118     Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
 119     Ryan C. Gordon          [reserved]                              [reserved]
 120                      DO NOT ADD YOUR NAME HERE
 121
 122                      Jacko Dirks
 123
 124   To add your name to the credits, pick a random blank space in the middle and fill it.
 125   80% of merge conflicts on stb PRs are due to people adding their name at the end
 126   of the credits.
 127 */
 128
 129 #ifndef STBI_INCLUDE_STB_IMAGE_H
 130 #define STBI_INCLUDE_STB_IMAGE_H
 131
 132 // DOCUMENTATION
 133 //
 134 // Limitations:
 135 //    - no 12-bit-per-channel JPEG
 136 //    - no JPEGs with arithmetic coding
 137 //    - GIF always returns *comp=4
 138 //
 139 // Basic usage (see HDR discussion below for HDR usage):
 140 //    int x,y,n;
 141 //    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
 142 //    // ... process data if not NULL ...
 143 //    // ... x = width, y = height, n = # 8-bit components per pixel ...
 144 //    // ... replace '0' with '1'..'4' to force that many components per pixel
 145 //    // ... but 'n' will always be the number that it would have been if you said 0
 146 //    stbi_image_free(data);
 147 //
 148 // Standard parameters:
 149 //    int *x                 -- outputs image width in pixels
 150 //    int *y                 -- outputs image height in pixels
 151 //    int *channels_in_file  -- outputs # of image components in image file
 152 //    int desired_channels   -- if non-zero, # of image components requested in result
 153 //
 154 // The return value from an image loader is an 'unsigned char *' which points
 155 // to the pixel data, or NULL on an allocation failure or if the image is
 156 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
 157 // with each pixel consisting of N interleaved 8-bit components; the first
 158 // pixel pointed to is top-left-most in the image. There is no padding between
 159 // image scanlines or between pixels, regardless of format. The number of
 160 // components N is 'desired_channels' if desired_channels is non-zero, or
 161 // *channels_in_file otherwise. If desired_channels is non-zero,
 162 // *channels_in_file has the number of components that _would_ have been
 163 // output otherwise. E.g. if you set desired_channels to 4, you will always
 164 // get RGBA output, but you can check *channels_in_file to see if it's trivially
 165 // opaque because e.g. there were only 3 channels in the source image.
 166 //
 167 // An output image with N components has the following components interleaved
 168 // in this order in each pixel:
 169 //
 170 //     N=#comp     components
 171 //       1           grey
 172 //       2           grey, alpha
 173 //       3           red, green, blue
 174 //       4           red, green, blue, alpha
 175 //
 176 // If image loading fails for any reason, the return value will be NULL,
 177 // and *x, *y, *channels_in_file will be unchanged. The function
 178 // stbi_failure_reason() can be queried for an extremely brief, end-user
 179 // unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
 180 // to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
 181 // more user-friendly ones.
 182 //
 183 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
 184 //
 185 // To query the width, height and component count of an image without having to
 186 // decode the full file, you can use the stbi_info family of functions:
 187 //
 188 //   int x,y,n,ok;
 189 //   ok = stbi_info(filename, &x, &y, &n);
 190 //   // returns ok=1 and sets x, y, n if image is a supported format,
 191 //   // 0 otherwise.
 192 //
 193 // Note that stb_image pervasively uses ints in its public API for sizes,
 194 // including sizes of memory buffers. This is now part of the API and thus
 195 // hard to change without causing breakage. As a result, the various image
 196 // loaders all have certain limits on image size; these differ somewhat
 197 // by format but generally boil down to either just under 2GB or just under
 198 // 1GB. When the decoded image would be larger than this, stb_image decoding
 199 // will fail.
 200 //
 201 // Additionally, stb_image will reject image files that have any of their
 202 // dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
 203 // which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
 204 // the only way to have an image with such dimensions load correctly
 205 // is for it to have a rather extreme aspect ratio. Either way, the
 206 // assumption here is that such larger images are likely to be malformed
 207 // or malicious. If you do need to load an image with individual dimensions
 208 // larger than that, and it still fits in the overall size limit, you can
 209 // #define STBI_MAX_DIMENSIONS on your own to be something larger.
 210 //
 211 // ===========================================================================
 212 //
 213 // UNICODE:
 214 //
 215 //   If compiling for Windows and you wish to use Unicode filenames, compile
 216 //   with
 217 //       #define STBI_WINDOWS_UTF8
 218 //   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
 219 //   Windows wchar_t filenames to utf8.
 220 //
 221 // ===========================================================================
 222 //
 223 // Philosophy
 224 //
 225 // stb libraries are designed with the following priorities:
 226 //
 227 //    1. easy to use
 228 //    2. easy to maintain
 229 //    3. good performance
 230 //
 231 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
 232 // and for best performance I may provide less-easy-to-use APIs that give higher
 233 // performance, in addition to the easy-to-use ones. Nevertheless, it's important
 234 // to keep in mind that from the standpoint of you, a client of this library,
 235 // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
 236 //
 237 // Some secondary priorities arise directly from the first two, some of which
 238 // provide more explicit reasons why performance can't be emphasized.
 239 //
 240 //    - Portable ("ease of use")
 241 //    - Small source code footprint ("easy to maintain")
 242 //    - No dependencies ("ease of use")
 243 //
 244 // ===========================================================================
 245 //
 246 // I/O callbacks
 247 //
 248 // I/O callbacks allow you to read from arbitrary sources, like packaged
 249 // files or some other source. Data read from callbacks are processed
 250 // through a small internal buffer (currently 128 bytes) to try to reduce
 251 // overhead.
 252 //
 253 // The three functions you must define are "read" (reads some bytes of data),
 254 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
 255 //
 256 // ===========================================================================
 257 //
 258 // SIMD support
 259 //
 260 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
 261 // supported by the compiler. For ARM Neon support, you must explicitly
 262 // request it.
 263 //
 264 // (The old do-it-yourself SIMD API is no longer supported in the current
 265 // code.)
 266 //
 267 // On x86, SSE2 will automatically be used when available based on a run-time
 268 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
 269 // the typical path is to have separate builds for NEON and non-NEON devices
 270 // (at least this is true for iOS and Android). Therefore, the NEON support is
 271 // toggled by a build flag: define STBI_NEON to get NEON loops.
 272 //
 273 // If for some reason you do not want to use any of SIMD code, or if
 274 // you have issues compiling it, you can disable it entirely by
 275 // defining STBI_NO_SIMD.
 276 //
 277 // ===========================================================================
 278 //
 279 // HDR image support   (disable by defining STBI_NO_HDR)
 280 //
 281 // stb_image supports loading HDR images in general, and currently the Radiance
 282 // .HDR file format specifically. You can still load any file through the existing
 283 // interface; if you attempt to load an HDR file, it will be automatically remapped
 284 // to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
 285 // both of these constants can be reconfigured through this interface:
 286 //
 287 //     stbi_hdr_to_ldr_gamma(2.2f);
 288 //     stbi_hdr_to_ldr_scale(1.0f);
 289 //
 290 // (note, do not use _inverse_ constants; stbi_image will invert them
 291 // appropriately).
 292 //
 293 // Additionally, there is a new, parallel interface for loading files as
 294 // (linear) floats to preserve the full dynamic range:
 295 //
 296 //    float *data = stbi_loadf(filename, &x, &y, &n, 0);
 297 //
 298 // If you load LDR images through this interface, those images will
 299 // be promoted to floating point values, run through the inverse of
 300 // constants corresponding to the above:
 301 //
 302 //     stbi_ldr_to_hdr_scale(1.0f);
 303 //     stbi_ldr_to_hdr_gamma(2.2f);
 304 //
 305 // Finally, given a filename (or an open file or memory block--see header
 306 // file for details) containing image data, you can query for the "most
 307 // appropriate" interface to use (that is, whether the image is HDR or
 308 // not), using:
 309 //
 310 //     stbi_is_hdr(char *filename);
 311 //
 312 // ===========================================================================
 313 //
 314 // iPhone PNG support:
 315 //
 316 // We optionally support converting iPhone-formatted PNGs (which store
 317 // premultiplied BGRA) back to RGB, even though they're internally encoded
 318 // differently. To enable this conversion, call
 319 // stbi_convert_iphone_png_to_rgb(1).
 320 //
 321 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
 322 // pixel to remove any premultiplied alpha *only* if the image file explicitly
 323 // says there's premultiplied data (currently only happens in iPhone images,
 324 // and only if iPhone convert-to-rgb processing is on).
 325 //
 326 // ===========================================================================
 327 //
 328 // ADDITIONAL CONFIGURATION
 329 //
 330 //  - You can suppress implementation of any of the decoders to reduce
 331 //    your code footprint by #defining one or more of the following
 332 //    symbols before creating the implementation.
 333 //
 334 //        STBI_NO_JPEG
 335 //        STBI_NO_PNG
 336 //        STBI_NO_BMP
 337 //        STBI_NO_PSD
 338 //        STBI_NO_TGA
 339 //        STBI_NO_GIF
 340 //        STBI_NO_HDR
 341 //        STBI_NO_PIC
 342 //        STBI_NO_PNM   (.ppm and .pgm)
 343 //
 344 //  - You can request *only* certain decoders and suppress all other ones
 345 //    (this will be more forward-compatible, as addition of new decoders
 346 //    doesn't require you to disable them explicitly):
 347 //
 348 //        STBI_ONLY_JPEG
 349 //        STBI_ONLY_PNG
 350 //        STBI_ONLY_BMP
 351 //        STBI_ONLY_PSD
 352 //        STBI_ONLY_TGA
 353 //        STBI_ONLY_GIF
 354 //        STBI_ONLY_HDR
 355 //        STBI_ONLY_PIC
 356 //        STBI_ONLY_PNM   (.ppm and .pgm)
 357 //
 358 //   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
 359 //     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
 360 //
 361 //  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
 362 //    than that size (in either width or height) without further processing.
 363 //    This is to let programs in the wild set an upper bound to prevent
 364 //    denial-of-service attacks on untrusted data, as one could generate a
 365 //    valid image of gigantic dimensions and force stb_image to allocate a
 366 //    huge block of memory and spend disproportionate time decoding it. By
 367 //    default this is set to (1 << 24), which is 16777216, but that's still
 368 //    very big.
 369
 370 #ifndef STBI_NO_STDIO
 371 #include <stdio.h>
 372 #endif // STBI_NO_STDIO
 373
 374 #define STBI_VERSION 1
 375
 376 enum
 377 {
 378    STBI_default = 0, // only used for desired_channels
 379
 380    STBI_grey       = 1,
 381    STBI_grey_alpha = 2,
 382    STBI_rgb        = 3,
 383    STBI_rgb_alpha  = 4
 384 };
 385
 386 #include <stdlib.h>
 387 typedef unsigned char stbi_uc;
 388 typedef unsigned short stbi_us;
 389
 390 #ifdef __cplusplus
 391 extern "C" {
 392 #endif
 393
 394 #ifndef STBIDEF
 395 #ifdef STB_IMAGE_STATIC
 396 #define STBIDEF static
 397 #else
 398 #define STBIDEF extern
 399 #endif
 400 #endif
 401
 402 //////////////////////////////////////////////////////////////////////////////
 403 //
 404 // PRIMARY API - works on images of any type
 405 //
 406
 407 //
 408 // load image by filename, open file, or memory buffer
 409 //
 410
 411 typedef struct
 412 {
 413    int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
 414    void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
 415    int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
 416 } stbi_io_callbacks;
 417
 418 ////////////////////////////////////
 419 //
 420 // 8-bits-per-channel interface
 421 //
 422
 423 STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
 424 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 425
 426 #ifndef STBI_NO_STDIO
 427 STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 428 STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 429 // for stbi_load_from_file, file pointer is left pointing immediately after image
 430 #endif
 431
 432 #ifndef STBI_NO_GIF
 433 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 434 #endif
 435
 436 #ifdef STBI_WINDOWS_UTF8
 437 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
 438 #endif
 439
 440 ////////////////////////////////////
 441 //
 442 // 16-bits-per-channel interface
 443 //
 444
 445 STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 446 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
 447
 448 #ifndef STBI_NO_STDIO
 449 STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 450 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 451 #endif
 452
 453 ////////////////////////////////////
 454 //
 455 // float-per-channel interface
 456 //
 457 #ifndef STBI_NO_LINEAR
 458    STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
 459    STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
 460
 461    #ifndef STBI_NO_STDIO
 462    STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
 463    STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
 464    #endif
 465 #endif
 466
 467 #ifndef STBI_NO_HDR
 468    STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
 469    STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
 470 #endif // STBI_NO_HDR
 471
 472 #ifndef STBI_NO_LINEAR
 473    STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
 474    STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
 475 #endif // STBI_NO_LINEAR
 476
 477 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
 478 STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 479 STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
 480 #ifndef STBI_NO_STDIO
 481 STBIDEF int      stbi_is_hdr          (char const *filename);
 482 STBIDEF int      stbi_is_hdr_from_file(FILE *f);
 483 #endif // STBI_NO_STDIO
 484
 485
 486 // get a VERY brief reason for failure
 487 // on most compilers (and ALL modern mainstream compilers) this is threadsafe
 488 STBIDEF const char *stbi_failure_reason  (void);
 489
 490 // free the loaded image -- this is just free()
 491 STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
 492
 493 // get image dimensions & components without fully decoding
 494 STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
 495 STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
 496 STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
 497 STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
 498
 499 #ifndef STBI_NO_STDIO
 500 STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
 501 STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
 502 STBIDEF int      stbi_is_16_bit          (char const *filename);
 503 STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
 504 #endif
 505
 506
 507
 508 // for image formats that explicitly notate that they have premultiplied alpha,
 509 // we just return the colors as stored in the file. set this flag to force
 510 // unpremultiplication. results are undefined if the unpremultiply overflow.
 511 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
 512
 513 // indicate whether we should process iphone images back to canonical format,
 514 // or just pass them through "as-is"
 515 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
 516
 517 // flip the image vertically, so the first pixel in the output array is the bottom left
 518 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
 519
 520 // as above, but only applies to images loaded on the thread that calls the function
 521 // this function is only available if your compiler supports thread-local variables;
 522 // calling it will fail to link if your compiler doesn't
 523 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
 524 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
 525 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
 526
 527 // ZLIB client - used by PNG, available for other purposes
 528
 529 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
 530 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
 531 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
 532 STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 533
 534 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
 535 STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
 536
 537
 538 #ifdef __cplusplus
 539 }
 540 #endif
 541
 542 //
 543 //
 544 ////   end header file   /////////////////////////////////////////////////////
 545 #endif // STBI_INCLUDE_STB_IMAGE_H
 546
 547 #ifdef STB_IMAGE_IMPLEMENTATION
 548
 549 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
 550   || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
 551   || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
 552   || defined(STBI_ONLY_ZLIB)
 553    #ifndef STBI_ONLY_JPEG
 554    #define STBI_NO_JPEG
 555    #endif
 556    #ifndef STBI_ONLY_PNG
 557    #define STBI_NO_PNG
 558    #endif
 559    #ifndef STBI_ONLY_BMP
 560    #define STBI_NO_BMP
 561    #endif
 562    #ifndef STBI_ONLY_PSD
 563    #define STBI_NO_PSD
 564    #endif
 565    #ifndef STBI_ONLY_TGA
 566    #define STBI_NO_TGA
 567    #endif
 568    #ifndef STBI_ONLY_GIF
 569    #define STBI_NO_GIF
 570    #endif
 571    #ifndef STBI_ONLY_HDR
 572    #define STBI_NO_HDR
 573    #endif
 574    #ifndef STBI_ONLY_PIC
 575    #define STBI_NO_PIC
 576    #endif
 577    #ifndef STBI_ONLY_PNM
 578    #define STBI_NO_PNM
 579    #endif
 580 #endif
 581
 582 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
 583 #define STBI_NO_ZLIB
 584 #endif
 585
 586
 587 #include <stdarg.h>
 588 #include <stddef.h> // ptrdiff_t on osx
 589 #include <stdlib.h>
 590 #include <string.h>
 591 #include <limits.h>
 592
 593 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
 594 #include <math.h>  // ldexp, pow
 595 #endif
 596
 597 #ifndef STBI_NO_STDIO
 598 #include <stdio.h>
 599 #endif
 600
 601 #ifndef STBI_ASSERT
 602 #include <assert.h>
 603 #define STBI_ASSERT(x) assert(x)
 604 #endif
 605
 606 #ifdef __cplusplus
 607 #define STBI_EXTERN extern "C"
 608 #else
 609 #define STBI_EXTERN extern
 610 #endif
 611
 612
 613 #ifndef _MSC_VER
 614    #ifdef __cplusplus
 615    #define stbi_inline inline
 616    #else
 617    #define stbi_inline
 618    #endif
 619 #else
 620    #define stbi_inline __forceinline
 621 #endif
 622
 623 #ifndef STBI_NO_THREAD_LOCALS
 624    #if defined(__cplusplus) &&  __cplusplus >= 201103L
 625       #define STBI_THREAD_LOCAL       thread_local
 626    #elif defined(__GNUC__) && __GNUC__ < 5
 627       #define STBI_THREAD_LOCAL       __thread
 628    #elif defined(_MSC_VER)
 629       #define STBI_THREAD_LOCAL       __declspec(thread)
 630    #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
 631       #define STBI_THREAD_LOCAL       _Thread_local
 632    #endif
 633
 634    #ifndef STBI_THREAD_LOCAL
 635       #if defined(__GNUC__)
 636         #define STBI_THREAD_LOCAL       __thread
 637       #endif
 638    #endif
 639 #endif
 640
 641 #if defined(_MSC_VER) || defined(__SYMBIAN32__)
 642 typedef unsigned short stbi__uint16;
 643 typedef   signed short stbi__int16;
 644 typedef unsigned int   stbi__uint32;
 645 typedef   signed int   stbi__int32;
 646 #else
 647 #include <stdint.h>
 648 typedef uint16_t stbi__uint16;
 649 typedef int16_t  stbi__int16;
 650 typedef uint32_t stbi__uint32;
 651 typedef int32_t  stbi__int32;
 652 #endif
 653
 654 // should produce compiler error if size is wrong
 655 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
 656
 657 #ifdef _MSC_VER
 658 #define STBI_NOTUSED(v)  (void)(v)
 659 #else
 660 #define STBI_NOTUSED(v)  (void)sizeof(v)
 661 #endif
 662
 663 #ifdef _MSC_VER
 664 #define STBI_HAS_LROTL
 665 #endif
 666
 667 #ifdef STBI_HAS_LROTL
 668    #define stbi_lrot(x,y)  _lrotl(x,y)
 669 #else
 670    #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
 671 #endif
 672
 673 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
 674 // ok
 675 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
 676 // ok
 677 #else
 678 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
 679 #endif
 680
 681 #ifndef STBI_MALLOC
 682 #define STBI_MALLOC(sz)           malloc(sz)
 683 #define STBI_REALLOC(p,newsz)     realloc(p,newsz)
 684 #define STBI_FREE(p)              free(p)
 685 #endif
 686
 687 #ifndef STBI_REALLOC_SIZED
 688 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
 689 #endif
 690
 691 // x86/x64 detection
 692 #if defined(__x86_64__) || defined(_M_X64)
 693 #define STBI__X64_TARGET
 694 #elif defined(__i386) || defined(_M_IX86)
 695 #define STBI__X86_TARGET
 696 #endif
 697
 698 #if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
 699 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
 700 // which in turn means it gets to use SSE2 everywhere. This is unfortunate,
 701 // but previous attempts to provide the SSE2 functions with runtime
 702 // detection caused numerous issues. The way architecture extensions are
 703 // exposed in GCC/Clang is, sadly, not really suited for one-file libs.
 704 // New behavior: if compiled with -msse2, we use SSE2 without any
 705 // detection; if not, we don't use it at all.
 706 #define STBI_NO_SIMD
 707 #endif
 708
 709 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
 710 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
 711 //
 712 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
 713 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
 714 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
 715 // simultaneously enabling "-mstackrealign".
 716 //
 717 // See https://github.com/nothings/stb/issues/81 for more information.
 718 //
 719 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
 720 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
 721 #define STBI_NO_SIMD
 722 #endif
 723
 724 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
 725 #define STBI_SSE2
 726 #include <emmintrin.h>
 727
 728 #ifdef _MSC_VER
 729
 730 #if _MSC_VER >= 1400  // not VC6
 731 #include <intrin.h> // __cpuid
 732 static int stbi__cpuid3(void)
 733 {
 734    int info[4];
 735    __cpuid(info,1);
 736    return info[3];
 737 }
 738 #else
 739 static int stbi__cpuid3(void)
 740 {
 741    int res;
 742    __asm {
 743       mov  eax,1
 744       cpuid
 745       mov  res,edx
 746    }
 747    return res;
 748 }
 749 #endif
 750
 751 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 752
 753 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 754 static int stbi__sse2_available(void)
 755 {
 756    int info3 = stbi__cpuid3();
 757    return ((info3 >> 26) & 1) != 0;
 758 }
 759 #endif
 760
 761 #else // assume GCC-style if not VC++
 762 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 763
 764 #if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
 765 static int stbi__sse2_available(void)
 766 {
 767    // If we're even attempting to compile this on GCC/Clang, that means
 768    // -msse2 is on, which means the compiler is allowed to use SSE2
 769    // instructions at will, and so are we.
 770    return 1;
 771 }
 772 #endif
 773
 774 #endif
 775 #endif
 776
 777 // ARM NEON
 778 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
 779 #undef STBI_NEON
 780 #endif
 781
 782 #ifdef STBI_NEON
 783 #include <arm_neon.h>
 784 #ifdef _MSC_VER
 785 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
 786 #else
 787 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
 788 #endif
 789 #endif
 790
 791 #ifndef STBI_SIMD_ALIGN
 792 #define STBI_SIMD_ALIGN(type, name) type name
 793 #endif
 794
 795 #ifndef STBI_MAX_DIMENSIONS
 796 #define STBI_MAX_DIMENSIONS (1 << 24)
 797 #endif
 798
 799 ///////////////////////////////////////////////
 800 //
 801 //  stbi__context struct and start_xxx functions
 802
 803 // stbi__context structure is our basic context used by all images, so it
 804 // contains all the IO context, plus some basic image information
 805 typedef struct
 806 {
 807    stbi__uint32 img_x, img_y;
 808    int img_n, img_out_n;
 809
 810    stbi_io_callbacks io;
 811    void *io_user_data;
 812
 813    int read_from_callbacks;
 814    int buflen;
 815    stbi_uc buffer_start[128];
 816    int callback_already_read;
 817
 818    stbi_uc *img_buffer, *img_buffer_end;
 819    stbi_uc *img_buffer_original, *img_buffer_original_end;
 820 } stbi__context;
 821
 822
 823 static void stbi__refill_buffer(stbi__context *s);
 824
 825 // initialize a memory-decode context
 826 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
 827 {
 828    s->io.read = NULL;
 829    s->read_from_callbacks = 0;
 830    s->callback_already_read = 0;
 831    s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
 832    s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
 833 }
 834
 835 // initialize a callback-based context
 836 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
 837 {
 838    s->io = *c;
 839    s->io_user_data = user;
 840    s->buflen = sizeof(s->buffer_start);
 841    s->read_from_callbacks = 1;
 842    s->callback_already_read = 0;
 843    s->img_buffer = s->img_buffer_original = s->buffer_start;
 844    stbi__refill_buffer(s);
 845    s->img_buffer_original_end = s->img_buffer_end;
 846 }
 847
 848 #ifndef STBI_NO_STDIO
 849
 850 static int stbi__stdio_read(void *user, char *data, int size)
 851 {
 852    return (int) fread(data,1,size,(FILE*) user);
 853 }
 854
 855 static void stbi__stdio_skip(void *user, int n)
 856 {
 857    int ch;
 858    fseek((FILE*) user, n, SEEK_CUR);
 859    ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
 860    if (ch != EOF) {
 861       ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
 862    }
 863 }
 864
 865 static int stbi__stdio_eof(void *user)
 866 {
 867    return feof((FILE*) user) || ferror((FILE *) user);
 868 }
 869
 870 static stbi_io_callbacks stbi__stdio_callbacks =
 871 {
 872    stbi__stdio_read,
 873    stbi__stdio_skip,
 874    stbi__stdio_eof,
 875 };
 876
 877 static void stbi__start_file(stbi__context *s, FILE *f)
 878 {
 879    stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
 880 }
 881
 882 //static void stop_file(stbi__context *s) { }
 883
 884 #endif // !STBI_NO_STDIO
 885
 886 static void stbi__rewind(stbi__context *s)
 887 {
 888    // conceptually rewind SHOULD rewind to the beginning of the stream,
 889    // but we just rewind to the beginning of the initial buffer, because
 890    // we only use it after doing 'test', which only ever looks at at most 92 bytes
 891    s->img_buffer = s->img_buffer_original;
 892    s->img_buffer_end = s->img_buffer_original_end;
 893 }
 894
 895 enum
 896 {
 897    STBI_ORDER_RGB,
 898    STBI_ORDER_BGR
 899 };
 900
 901 typedef struct
 902 {
 903    int bits_per_channel;
 904    int num_channels;
 905    int channel_order;
 906 } stbi__result_info;
 907
 908 #ifndef STBI_NO_JPEG
 909 static int      stbi__jpeg_test(stbi__context *s);
 910 static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 911 static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
 912 #endif
 913
 914 #ifndef STBI_NO_PNG
 915 static int      stbi__png_test(stbi__context *s);
 916 static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 917 static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
 918 static int      stbi__png_is16(stbi__context *s);
 919 #endif
 920
 921 #ifndef STBI_NO_BMP
 922 static int      stbi__bmp_test(stbi__context *s);
 923 static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 924 static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
 925 #endif
 926
 927 #ifndef STBI_NO_TGA
 928 static int      stbi__tga_test(stbi__context *s);
 929 static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 930 static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
 931 #endif
 932
 933 #ifndef STBI_NO_PSD
 934 static int      stbi__psd_test(stbi__context *s);
 935 static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
 936 static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
 937 static int      stbi__psd_is16(stbi__context *s);
 938 #endif
 939
 940 #ifndef STBI_NO_HDR
 941 static int      stbi__hdr_test(stbi__context *s);
 942 static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 943 static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
 944 #endif
 945
 946 #ifndef STBI_NO_PIC
 947 static int      stbi__pic_test(stbi__context *s);
 948 static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 949 static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
 950 #endif
 951
 952 #ifndef STBI_NO_GIF
 953 static int      stbi__gif_test(stbi__context *s);
 954 static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 955 static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
 956 static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
 957 #endif
 958
 959 #ifndef STBI_NO_PNM
 960 static int      stbi__pnm_test(stbi__context *s);
 961 static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
 962 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
 963 static int      stbi__pnm_is16(stbi__context *s);
 964 #endif
 965
 966 static
 967 #ifdef STBI_THREAD_LOCAL
 968 STBI_THREAD_LOCAL
 969 #endif
 970 const char *stbi__g_failure_reason;
 971
 972 STBIDEF const char *stbi_failure_reason(void)
 973 {
 974    return stbi__g_failure_reason;
 975 }
 976
 977 #ifndef STBI_NO_FAILURE_STRINGS
 978 static int stbi__err(const char *str)
 979 {
 980    stbi__g_failure_reason = str;
 981    return 0;
 982 }
 983 #endif
 984
 985 static void *stbi__malloc(size_t size)
 986 {
 987     return STBI_MALLOC(size);
 988 }
 989
 990 // stb_image uses ints pervasively, including for offset calculations.
 991 // therefore the largest decoded image size we can support with the
 992 // current code, even on 64-bit targets, is INT_MAX. this is not a
 993 // significant limitation for the intended use case.
 994 //
 995 // we do, however, need to make sure our size calculations don't
 996 // overflow. hence a few helper functions for size calculations that
 997 // multiply integers together, making sure that they're non-negative
 998 // and no overflow occurs.
 999
1000 // return 1 if the sum is valid, 0 on overflow.
1001 // negative terms are considered invalid.
1002 static int stbi__addsizes_valid(int a, int b)
1003 {
1004    if (b < 0) return 0;
1005    // now 0 <= b <= INT_MAX, hence also
1006    // 0 <= INT_MAX - b <= INTMAX.
1007    // And "a + b <= INT_MAX" (which might overflow) is the
1008    // same as a <= INT_MAX - b (no overflow)
1009    return a <= INT_MAX - b;
1010 }
1011
1012 // returns 1 if the product is valid, 0 on overflow.
1013 // negative factors are considered invalid.
1014 static int stbi__mul2sizes_valid(int a, int b)
1015 {
1016    if (a < 0 || b < 0) return 0;
1017    if (b == 0) return 1; // mul-by-0 is always safe
1018    // portable way to check for no overflows in a*b
1019    return a <= INT_MAX/b;
1020 }
1021
1022 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1023 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
1024 static int stbi__mad2sizes_valid(int a, int b, int add)
1025 {
1026    return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
1027 }
1028 #endif
1029
1030 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
1031 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
1032 {
1033    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1034       stbi__addsizes_valid(a*b*c, add);
1035 }
1036
1037 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
1038 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1039 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
1040 {
1041    return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
1042       stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
1043 }
1044 #endif
1045
1046 #if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
1047 // mallocs with size overflow checking
1048 static void *stbi__malloc_mad2(int a, int b, int add)
1049 {
1050    if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
1051    return stbi__malloc(a*b + add);
1052 }
1053 #endif
1054
1055 static void *stbi__malloc_mad3(int a, int b, int c, int add)
1056 {
1057    if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
1058    return stbi__malloc(a*b*c + add);
1059 }
1060
1061 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
1062 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
1063 {
1064    if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
1065    return stbi__malloc(a*b*c*d + add);
1066 }
1067 #endif
1068
1069 // returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow.
1070 static int stbi__addints_valid(int a, int b)
1071 {
1072    if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow
1073    if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0.
1074    return a <= INT_MAX - b;
1075 }
1076
1077 // returns 1 if the product of two ints fits in a signed short, 0 on overflow.
1078 static int stbi__mul2shorts_valid(int a, int b)
1079 {
1080    if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow
1081    if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid
1082    if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
1083    return a >= SHRT_MIN / b;
1084 }
1085
1086 // stbi__err - error
1087 // stbi__errpf - error returning pointer to float
1088 // stbi__errpuc - error returning pointer to unsigned char
1089
1090 #ifdef STBI_NO_FAILURE_STRINGS
1091    #define stbi__err(x,y)  0
1092 #elif defined(STBI_FAILURE_USERMSG)
1093    #define stbi__err(x,y)  stbi__err(y)
1094 #else
1095    #define stbi__err(x,y)  stbi__err(x)
1096 #endif
1097
1098 #define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1099 #define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1100
1101 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1102 {
1103    STBI_FREE(retval_from_stbi_load);
1104 }
1105
1106 #ifndef STBI_NO_LINEAR
1107 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1108 #endif
1109
1110 #ifndef STBI_NO_HDR
1111 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
1112 #endif
1113
1114 static int stbi__vertically_flip_on_load_global = 0;
1115
1116 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1117 {
1118    stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
1119 }
1120
1121 #ifndef STBI_THREAD_LOCAL
1122 #define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
1123 #else
1124 static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
1125
1126 STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
1127 {
1128    stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
1129    stbi__vertically_flip_on_load_set = 1;
1130 }
1131
1132 #define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
1133                                          ? stbi__vertically_flip_on_load_local  \
1134                                          : stbi__vertically_flip_on_load_global)
1135 #endif // STBI_THREAD_LOCAL
1136
1137 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1138 {
1139    memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1140    ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1141    ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1142    ri->num_channels = 0;
1143
1144    // test the formats with a very explicit header first (at least a FOURCC
1145    // or distinctive magic number first)
1146    #ifndef STBI_NO_PNG
1147    if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
1148    #endif
1149    #ifndef STBI_NO_BMP
1150    if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
1151    #endif
1152    #ifndef STBI_NO_GIF
1153    if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
1154    #endif
1155    #ifndef STBI_NO_PSD
1156    if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
1157    #else
1158    STBI_NOTUSED(bpc);
1159    #endif
1160    #ifndef STBI_NO_PIC
1161    if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
1162    #endif
1163
1164    // then the formats that can end up attempting to load with just 1 or 2
1165    // bytes matching expectations; these are prone to false positives, so
1166    // try them later
1167    #ifndef STBI_NO_JPEG
1168    if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
1169    #endif
1170    #ifndef STBI_NO_PNM
1171    if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
1172    #endif
1173
1174    #ifndef STBI_NO_HDR
1175    if (stbi__hdr_test(s)) {
1176       float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
1177       return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1178    }
1179    #endif
1180
1181    #ifndef STBI_NO_TGA
1182    // test tga last because it's a crappy test!
1183    if (stbi__tga_test(s))
1184       return stbi__tga_load(s,x,y,comp,req_comp, ri);
1185    #endif
1186
1187    return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1188 }
1189
1190 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1191 {
1192    int i;
1193    int img_len = w * h * channels;
1194    stbi_uc *reduced;
1195
1196    reduced = (stbi_uc *) stbi__malloc(img_len);
1197    if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1198
1199    for (i = 0; i < img_len; ++i)
1200       reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1201
1202    STBI_FREE(orig);
1203    return reduced;
1204 }
1205
1206 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1207 {
1208    int i;
1209    int img_len = w * h * channels;
1210    stbi__uint16 *enlarged;
1211
1212    enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
1213    if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1214
1215    for (i = 0; i < img_len; ++i)
1216       enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1217
1218    STBI_FREE(orig);
1219    return enlarged;
1220 }
1221
1222 static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
1223 {
1224    int row;
1225    size_t bytes_per_row = (size_t)w * bytes_per_pixel;
1226    stbi_uc temp[2048];
1227    stbi_uc *bytes = (stbi_uc *)image;
1228
1229    for (row = 0; row < (h>>1); row++) {
1230       stbi_uc *row0 = bytes + row*bytes_per_row;
1231       stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
1232       // swap row0 with row1
1233       size_t bytes_left = bytes_per_row;
1234       while (bytes_left) {
1235          size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
1236          memcpy(temp, row0, bytes_copy);
1237          memcpy(row0, row1, bytes_copy);
1238          memcpy(row1, temp, bytes_copy);
1239          row0 += bytes_copy;
1240          row1 += bytes_copy;
1241          bytes_left -= bytes_copy;
1242       }
1243    }
1244 }
1245
1246 #ifndef STBI_NO_GIF
1247 static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
1248 {
1249    int slice;
1250    int slice_size = w * h * bytes_per_pixel;
1251
1252    stbi_uc *bytes = (stbi_uc *)image;
1253    for (slice = 0; slice < z; ++slice) {
1254       stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
1255       bytes += slice_size;
1256    }
1257 }
1258 #endif
1259
1260 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1261 {
1262    stbi__result_info ri;
1263    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1264
1265    if (result == NULL)
1266       return NULL;
1267
1268    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1269    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1270
1271    if (ri.bits_per_channel != 8) {
1272       result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1273       ri.bits_per_channel = 8;
1274    }
1275
1276    // @TODO: move stbi__convert_format to here
1277
1278    if (stbi__vertically_flip_on_load) {
1279       int channels = req_comp ? req_comp : *comp;
1280       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
1281    }
1282
1283    return (unsigned char *) result;
1284 }
1285
1286 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1287 {
1288    stbi__result_info ri;
1289    void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1290
1291    if (result == NULL)
1292       return NULL;
1293
1294    // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
1295    STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
1296
1297    if (ri.bits_per_channel != 16) {
1298       result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1299       ri.bits_per_channel = 16;
1300    }
1301
1302    // @TODO: move stbi__convert_format16 to here
1303    // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1304
1305    if (stbi__vertically_flip_on_load) {
1306       int channels = req_comp ? req_comp : *comp;
1307       stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
1308    }
1309
1310    return (stbi__uint16 *) result;
1311 }
1312
1313 #if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
1314 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1315 {
1316    if (stbi__vertically_flip_on_load && result != NULL) {
1317       int channels = req_comp ? req_comp : *comp;
1318       stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
1319    }
1320 }
1321 #endif
1322
1323 #ifndef STBI_NO_STDIO
1324
1325 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1326 STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
1327 STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
1328 #endif
1329
1330 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1331 STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
1332 {
1333         return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
1334 }
1335 #endif
1336
1337 static FILE *stbi__fopen(char const *filename, char const *mode)
1338 {
1339    FILE *f;
1340 #if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
1341    wchar_t wMode[64];
1342    wchar_t wFilename[1024];
1343         if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
1344       return 0;
1345
1346         if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
1347       return 0;
1348
1349 #if defined(_MSC_VER) && _MSC_VER >= 1400
1350         if (0 != _wfopen_s(&f, wFilename, wMode))
1351                 f = 0;
1352 #else
1353    f = _wfopen(wFilename, wMode);
1354 #endif
1355
1356 #elif defined(_MSC_VER) && _MSC_VER >= 1400
1357    if (0 != fopen_s(&f, filename, mode))
1358       f=0;
1359 #else
1360    f = fopen(filename, mode);
1361 #endif
1362    return f;
1363 }
1364
1365
1366 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1367 {
1368    FILE *f = stbi__fopen(filename, "rb");
1369    unsigned char *result;
1370    if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1371    result = stbi_load_from_file(f,x,y,comp,req_comp);
1372    fclose(f);
1373    return result;
1374 }
1375
1376 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1377 {
1378    unsigned char *result;
1379    stbi__context s;
1380    stbi__start_file(&s,f);
1381    result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1382    if (result) {
1383       // need to 'unget' all the characters in the IO buffer
1384       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1385    }
1386    return result;
1387 }
1388
1389 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1390 {
1391    stbi__uint16 *result;
1392    stbi__context s;
1393    stbi__start_file(&s,f);
1394    result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
1395    if (result) {
1396       // need to 'unget' all the characters in the IO buffer
1397       fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1398    }
1399    return result;
1400 }
1401
1402 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1403 {
1404    FILE *f = stbi__fopen(filename, "rb");
1405    stbi__uint16 *result;
1406    if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
1407    result = stbi_load_from_file_16(f,x,y,comp,req_comp);
1408    fclose(f);
1409    return result;
1410 }
1411
1412
1413 #endif //!STBI_NO_STDIO
1414
1415 STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
1416 {
1417    stbi__context s;
1418    stbi__start_mem(&s,buffer,len);
1419    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1420 }
1421
1422 STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
1423 {
1424    stbi__context s;
1425    stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
1426    return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
1427 }
1428
1429 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1430 {
1431    stbi__context s;
1432    stbi__start_mem(&s,buffer,len);
1433    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1434 }
1435
1436 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1437 {
1438    stbi__context s;
1439    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1440    return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1441 }
1442
1443 #ifndef STBI_NO_GIF
1444 STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
1445 {
1446    unsigned char *result;
1447    stbi__context s;
1448    stbi__start_mem(&s,buffer,len);
1449
1450    result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
1451    if (stbi__vertically_flip_on_load) {
1452       stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
1453    }
1454
1455    return result;
1456 }
1457 #endif
1458
1459 #ifndef STBI_NO_LINEAR
1460 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1461 {
1462    unsigned char *data;
1463    #ifndef STBI_NO_HDR
1464    if (stbi__hdr_test(s)) {
1465       stbi__result_info ri;
1466       float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
1467       if (hdr_data)
1468          stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1469       return hdr_data;
1470    }
1471    #endif
1472    data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1473    if (data)
1474       return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1475    return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1476 }
1477
1478 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1479 {
1480    stbi__context s;
1481    stbi__start_mem(&s,buffer,len);
1482    return stbi__loadf_main(&s,x,y,comp,req_comp);
1483 }
1484
1485 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1486 {
1487    stbi__context s;
1488    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1489    return stbi__loadf_main(&s,x,y,comp,req_comp);
1490 }
1491
1492 #ifndef STBI_NO_STDIO
1493 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1494 {
1495    float *result;
1496    FILE *f = stbi__fopen(filename, "rb");
1497    if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1498    result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1499    fclose(f);
1500    return result;
1501 }
1502
1503 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1504 {
1505    stbi__context s;
1506    stbi__start_file(&s,f);
1507    return stbi__loadf_main(&s,x,y,comp,req_comp);
1508 }
1509 #endif // !STBI_NO_STDIO
1510
1511 #endif // !STBI_NO_LINEAR
1512
1513 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1514 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1515 // reports false!
1516
1517 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1518 {
1519    #ifndef STBI_NO_HDR
1520    stbi__context s;
1521    stbi__start_mem(&s,buffer,len);
1522    return stbi__hdr_test(&s);
1523    #else
1524    STBI_NOTUSED(buffer);
1525    STBI_NOTUSED(len);
1526    return 0;
1527    #endif
1528 }
1529
1530 #ifndef STBI_NO_STDIO
1531 STBIDEF int      stbi_is_hdr          (char const *filename)
1532 {
1533    FILE *f = stbi__fopen(filename, "rb");
1534    int result=0;
1535    if (f) {
1536       result = stbi_is_hdr_from_file(f);
1537       fclose(f);
1538    }
1539    return result;
1540 }
1541
1542 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1543 {
1544    #ifndef STBI_NO_HDR
1545    long pos = ftell(f);
1546    int res;
1547    stbi__context s;
1548    stbi__start_file(&s,f);
1549    res = stbi__hdr_test(&s);
1550    fseek(f, pos, SEEK_SET);
1551    return res;
1552    #else
1553    STBI_NOTUSED(f);
1554    return 0;
1555    #endif
1556 }
1557 #endif // !STBI_NO_STDIO
1558
1559 STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1560 {
1561    #ifndef STBI_NO_HDR
1562    stbi__context s;
1563    stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1564    return stbi__hdr_test(&s);
1565    #else
1566    STBI_NOTUSED(clbk);
1567    STBI_NOTUSED(user);
1568    return 0;
1569    #endif
1570 }
1571
1572 #ifndef STBI_NO_LINEAR
1573 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1574
1575 STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1576 STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1577 #endif
1578
1579 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1580
1581 STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1582 STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1583
1584
1585 //////////////////////////////////////////////////////////////////////////////
1586 //
1587 // Common code used by all image loaders
1588 //
1589
1590 enum
1591 {
1592    STBI__SCAN_load=0,
1593    STBI__SCAN_type,
1594    STBI__SCAN_header
1595 };
1596
1597 static void stbi__refill_buffer(stbi__context *s)
1598 {
1599    int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1600    s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
1601    if (n == 0) {
1602       // at end of file, treat same as if from memory, but need to handle case
1603       // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1604       s->read_from_callbacks = 0;
1605       s->img_buffer = s->buffer_start;
1606       s->img_buffer_end = s->buffer_start+1;
1607       *s->img_buffer = 0;
1608    } else {
1609       s->img_buffer = s->buffer_start;
1610       s->img_buffer_end = s->buffer_start + n;
1611    }
1612 }
1613
1614 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1615 {
1616    if (s->img_buffer < s->img_buffer_end)
1617       return *s->img_buffer++;
1618    if (s->read_from_callbacks) {
1619       stbi__refill_buffer(s);
1620       return *s->img_buffer++;
1621    }
1622    return 0;
1623 }
1624
1625 #if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1626 // nothing
1627 #else
1628 stbi_inline static int stbi__at_eof(stbi__context *s)
1629 {
1630    if (s->io.read) {
1631       if (!(s->io.eof)(s->io_user_data)) return 0;
1632       // if feof() is true, check if buffer = end
1633       // special case: we've only got the special 0 character at the end
1634       if (s->read_from_callbacks == 0) return 1;
1635    }
1636
1637    return s->img_buffer >= s->img_buffer_end;
1638 }
1639 #endif
1640
1641 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
1642 // nothing
1643 #else
1644 static void stbi__skip(stbi__context *s, int n)
1645 {
1646    if (n == 0) return;  // already there!
1647    if (n < 0) {
1648       s->img_buffer = s->img_buffer_end;
1649       return;
1650    }
1651    if (s->io.read) {
1652       int blen = (int) (s->img_buffer_end - s->img_buffer);
1653       if (blen < n) {
1654          s->img_buffer = s->img_buffer_end;
1655          (s->io.skip)(s->io_user_data, n - blen);
1656          return;
1657       }
1658    }
1659    s->img_buffer += n;
1660 }
1661 #endif
1662
1663 #if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
1664 // nothing
1665 #else
1666 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1667 {
1668    if (s->io.read) {
1669       int blen = (int) (s->img_buffer_end - s->img_buffer);
1670       if (blen < n) {
1671          int res, count;
1672
1673          memcpy(buffer, s->img_buffer, blen);
1674
1675          count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1676          res = (count == (n-blen));
1677          s->img_buffer = s->img_buffer_end;
1678          return res;
1679       }
1680    }
1681
1682    if (s->img_buffer+n <= s->img_buffer_end) {
1683       memcpy(buffer, s->img_buffer, n);
1684       s->img_buffer += n;
1685       return 1;
1686    } else
1687       return 0;
1688 }
1689 #endif
1690
1691 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1692 // nothing
1693 #else
1694 static int stbi__get16be(stbi__context *s)
1695 {
1696    int z = stbi__get8(s);
1697    return (z << 8) + stbi__get8(s);
1698 }
1699 #endif
1700
1701 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
1702 // nothing
1703 #else
1704 static stbi__uint32 stbi__get32be(stbi__context *s)
1705 {
1706    stbi__uint32 z = stbi__get16be(s);
1707    return (z << 16) + stbi__get16be(s);
1708 }
1709 #endif
1710
1711 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1712 // nothing
1713 #else
1714 static int stbi__get16le(stbi__context *s)
1715 {
1716    int z = stbi__get8(s);
1717    return z + (stbi__get8(s) << 8);
1718 }
1719 #endif
1720
1721 #ifndef STBI_NO_BMP
1722 static stbi__uint32 stbi__get32le(stbi__context *s)
1723 {
1724    stbi__uint32 z = stbi__get16le(s);
1725    z += (stbi__uint32)stbi__get16le(s) << 16;
1726    return z;
1727 }
1728 #endif
1729
1730 #define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
1731
1732 #if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1733 // nothing
1734 #else
1735 //////////////////////////////////////////////////////////////////////////////
1736 //
1737 //  generic converter from built-in img_n to req_comp
1738 //    individual types do this automatically as much as possible (e.g. jpeg
1739 //    does all cases internally since it needs to colorspace convert anyway,
1740 //    and it never has alpha, so very few cases ). png can automatically
1741 //    interleave an alpha=255 channel, but falls back to this for other cases
1742 //
1743 //  assume data buffer is malloced, so malloc a new one and free that one
1744 //  only failure mode is malloc failing
1745
1746 static stbi_uc stbi__compute_y(int r, int g, int b)
1747 {
1748    return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
1749 }
1750 #endif
1751
1752 #if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
1753 // nothing
1754 #else
1755 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1756 {
1757    int i,j;
1758    unsigned char *good;
1759
1760    if (req_comp == img_n) return data;
1761    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1762
1763    good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
1764    if (good == NULL) {
1765       STBI_FREE(data);
1766       return stbi__errpuc("outofmem", "Out of memory");
1767    }
1768
1769    for (j=0; j < (int) y; ++j) {
1770       unsigned char *src  = data + j * x * img_n   ;
1771       unsigned char *dest = good + j * x * req_comp;
1772
1773       #define STBI__COMBO(a,b)  ((a)*8+(b))
1774       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1775       // convert source image with img_n components to one with req_comp components;
1776       // avoid switch per pixel, so use switch per scanline and massive macros
1777       switch (STBI__COMBO(img_n, req_comp)) {
1778          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
1779          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1780          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
1781          STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
1782          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
1783          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
1784          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
1785          STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1786          STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
1787          STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
1788          STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1789          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
1790          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
1791       }
1792       #undef STBI__CASE
1793    }
1794
1795    STBI_FREE(data);
1796    return good;
1797 }
1798 #endif
1799
1800 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1801 // nothing
1802 #else
1803 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1804 {
1805    return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
1806 }
1807 #endif
1808
1809 #if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
1810 // nothing
1811 #else
1812 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1813 {
1814    int i,j;
1815    stbi__uint16 *good;
1816
1817    if (req_comp == img_n) return data;
1818    STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1819
1820    good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
1821    if (good == NULL) {
1822       STBI_FREE(data);
1823       return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1824    }
1825
1826    for (j=0; j < (int) y; ++j) {
1827       stbi__uint16 *src  = data + j * x * img_n   ;
1828       stbi__uint16 *dest = good + j * x * req_comp;
1829
1830       #define STBI__COMBO(a,b)  ((a)*8+(b))
1831       #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1832       // convert source image with img_n components to one with req_comp components;
1833       // avoid switch per pixel, so use switch per scanline and massive macros
1834       switch (STBI__COMBO(img_n, req_comp)) {
1835          STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
1836          STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1837          STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
1838          STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
1839          STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
1840          STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
1841          STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
1842          STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1843          STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
1844          STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
1845          STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
1846          STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
1847          default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
1848       }
1849       #undef STBI__CASE
1850    }
1851
1852    STBI_FREE(data);
1853    return good;
1854 }
1855 #endif
1856
1857 #ifndef STBI_NO_LINEAR
1858 static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1859 {
1860    int i,k,n;
1861    float *output;
1862    if (!data) return NULL;
1863    output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1864    if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1865    // compute number of non-alpha components
1866    if (comp & 1) n = comp; else n = comp-1;
1867    for (i=0; i < x*y; ++i) {
1868       for (k=0; k < n; ++k) {
1869          output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1870       }
1871    }
1872    if (n < comp) {
1873       for (i=0; i < x*y; ++i) {
1874          output[i*comp + n] = data[i*comp + n]/255.0f;
1875       }
1876    }
1877    STBI_FREE(data);
1878    return output;
1879 }
1880 #endif
1881
1882 #ifndef STBI_NO_HDR
1883 #define stbi__float2int(x)   ((int) (x))
1884 static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
1885 {
1886    int i,k,n;
1887    stbi_uc *output;
1888    if (!data) return NULL;
1889    output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
1890    if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1891    // compute number of non-alpha components
1892    if (comp & 1) n = comp; else n = comp-1;
1893    for (i=0; i < x*y; ++i) {
1894       for (k=0; k < n; ++k) {
1895          float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1896          if (z < 0) z = 0;
1897          if (z > 255) z = 255;
1898          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1899       }
1900       if (k < comp) {
1901          float z = data[i*comp+k] * 255 + 0.5f;
1902          if (z < 0) z = 0;
1903          if (z > 255) z = 255;
1904          output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1905       }
1906    }
1907    STBI_FREE(data);
1908    return output;
1909 }
1910 #endif
1911
1912 //////////////////////////////////////////////////////////////////////////////
1913 //
1914 //  "baseline" JPEG/JFIF decoder
1915 //
1916 //    simple implementation
1917 //      - doesn't support delayed output of y-dimension
1918 //      - simple interface (only one output format: 8-bit interleaved RGB)
1919 //      - doesn't try to recover corrupt jpegs
1920 //      - doesn't allow partial loading, loading multiple at once
1921 //      - still fast on x86 (copying globals into locals doesn't help x86)
1922 //      - allocates lots of intermediate memory (full size of all components)
1923 //        - non-interleaved case requires this anyway
1924 //        - allows good upsampling (see next)
1925 //    high-quality
1926 //      - upsampled channels are bilinearly interpolated, even across blocks
1927 //      - quality integer IDCT derived from IJG's 'slow'
1928 //    performance
1929 //      - fast huffman; reasonable integer IDCT
1930 //      - some SIMD kernels for common paths on targets with SSE2/NEON
1931 //      - uses a lot of intermediate memory, could cache poorly
1932
1933 #ifndef STBI_NO_JPEG
1934
1935 // huffman decoding acceleration
1936 #define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
1937
1938 typedef struct
1939 {
1940    stbi_uc  fast[1 << FAST_BITS];
1941    // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1942    stbi__uint16 code[256];
1943    stbi_uc  values[256];
1944    stbi_uc  size[257];
1945    unsigned int maxcode[18];
1946    int    delta[17];   // old 'firstsymbol' - old 'firstcode'
1947 } stbi__huffman;
1948
1949 typedef struct
1950 {
1951    stbi__context *s;
1952    stbi__huffman huff_dc[4];
1953    stbi__huffman huff_ac[4];
1954    stbi__uint16 dequant[4][64];
1955    stbi__int16 fast_ac[4][1 << FAST_BITS];
1956
1957 // sizes for components, interleaved MCUs
1958    int img_h_max, img_v_max;
1959    int img_mcu_x, img_mcu_y;
1960    int img_mcu_w, img_mcu_h;
1961
1962 // definition of jpeg image component
1963    struct
1964    {
1965       int id;
1966       int h,v;
1967       int tq;
1968       int hd,ha;
1969       int dc_pred;
1970
1971       int x,y,w2,h2;
1972       stbi_uc *data;
1973       void *raw_data, *raw_coeff;
1974       stbi_uc *linebuf;
1975       short   *coeff;   // progressive only
1976       int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
1977    } img_comp[4];
1978
1979    stbi__uint32   code_buffer; // jpeg entropy-coded buffer
1980    int            code_bits;   // number of valid bits
1981    unsigned char  marker;      // marker seen while filling entropy buffer
1982    int            nomore;      // flag if we saw a marker so must stop
1983
1984    int            progressive;
1985    int            spec_start;
1986    int            spec_end;
1987    int            succ_high;
1988    int            succ_low;
1989    int            eob_run;
1990    int            jfif;
1991    int            app14_color_transform; // Adobe APP14 tag
1992    int            rgb;
1993
1994    int scan_n, order[4];
1995    int restart_interval, todo;
1996
1997 // kernels
1998    void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1999    void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
2000    stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
2001 } stbi__jpeg;
2002
2003 static int stbi__build_huffman(stbi__huffman *h, int *count)
2004 {
2005    int i,j,k=0;
2006    unsigned int code;
2007    // build size list for each symbol (from JPEG spec)
2008    for (i=0; i < 16; ++i) {
2009       for (j=0; j < count[i]; ++j) {
2010          h->size[k++] = (stbi_uc) (i+1);
2011          if(k >= 257) return stbi__err("bad size list","Corrupt JPEG");
2012       }
2013    }
2014    h->size[k] = 0;
2015
2016    // compute actual symbols (from jpeg spec)
2017    code = 0;
2018    k = 0;
2019    for(j=1; j <= 16; ++j) {
2020       // compute delta to add to code to compute symbol id
2021       h->delta[j] = k - code;
2022       if (h->size[k] == j) {
2023          while (h->size[k] == j)
2024             h->code[k++] = (stbi__uint16) (code++);
2025          if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
2026       }
2027       // compute largest code + 1 for this size, preshifted as needed later
2028       h->maxcode[j] = code << (16-j);
2029       code <<= 1;
2030    }
2031    h->maxcode[j] = 0xffffffff;
2032
2033    // build non-spec acceleration table; 255 is flag for not-accelerated
2034    memset(h->fast, 255, 1 << FAST_BITS);
2035    for (i=0; i < k; ++i) {
2036       int s = h->size[i];
2037       if (s <= FAST_BITS) {
2038          int c = h->code[i] << (FAST_BITS-s);
2039          int m = 1 << (FAST_BITS-s);
2040          for (j=0; j < m; ++j) {
2041             h->fast[c+j] = (stbi_uc) i;
2042          }
2043       }
2044    }
2045    return 1;
2046 }
2047
2048 // build a table that decodes both magnitude and value of small ACs in
2049 // one go.
2050 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
2051 {
2052    int i;
2053    for (i=0; i < (1 << FAST_BITS); ++i) {
2054       stbi_uc fast = h->fast[i];
2055       fast_ac[i] = 0;
2056       if (fast < 255) {
2057          int rs = h->values[fast];
2058          int run = (rs >> 4) & 15;
2059          int magbits = rs & 15;
2060          int len = h->size[fast];
2061
2062          if (magbits && len + magbits <= FAST_BITS) {
2063             // magnitude code followed by receive_extend code
2064             int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
2065             int m = 1 << (magbits - 1);
2066             if (k < m) k += (~0U << magbits) + 1;
2067             // if the result is small enough, we can fit it in fast_ac table
2068             if (k >= -128 && k <= 127)
2069                fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
2070          }
2071       }
2072    }
2073 }
2074
2075 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
2076 {
2077    do {
2078       unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
2079       if (b == 0xff) {
2080          int c = stbi__get8(j->s);
2081          while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
2082          if (c != 0) {
2083             j->marker = (unsigned char) c;
2084             j->nomore = 1;
2085             return;
2086          }
2087       }
2088       j->code_buffer |= b << (24 - j->code_bits);
2089       j->code_bits += 8;
2090    } while (j->code_bits <= 24);
2091 }
2092
2093 // (1 << n) - 1
2094 static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
2095
2096 // decode a jpeg huffman value from the bitstream
2097 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
2098 {
2099    unsigned int temp;
2100    int c,k;
2101
2102    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2103
2104    // look at the top FAST_BITS and determine what symbol ID it is,
2105    // if the code is <= FAST_BITS
2106    c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2107    k = h->fast[c];
2108    if (k < 255) {
2109       int s = h->size[k];
2110       if (s > j->code_bits)
2111          return -1;
2112       j->code_buffer <<= s;
2113       j->code_bits -= s;
2114       return h->values[k];
2115    }
2116
2117    // naive test is to shift the code_buffer down so k bits are
2118    // valid, then test against maxcode. To speed this up, we've
2119    // preshifted maxcode left so that it has (16-k) 0s at the
2120    // end; in other words, regardless of the number of bits, it
2121    // wants to be compared against something shifted to have 16;
2122    // that way we don't need to shift inside the loop.
2123    temp = j->code_buffer >> 16;
2124    for (k=FAST_BITS+1 ; ; ++k)
2125       if (temp < h->maxcode[k])
2126          break;
2127    if (k == 17) {
2128       // error! code not found
2129       j->code_bits -= 16;
2130       return -1;
2131    }
2132
2133    if (k > j->code_bits)
2134       return -1;
2135
2136    // convert the huffman code to the symbol id
2137    c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
2138    if(c < 0 || c >= 256) // symbol id out of bounds!
2139        return -1;
2140    STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
2141
2142    // convert the id to a symbol
2143    j->code_bits -= k;
2144    j->code_buffer <<= k;
2145    return h->values[c];
2146 }
2147
2148 // bias[n] = (-1<<n) + 1
2149 static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
2150
2151 // combined JPEG 'receive' and JPEG 'extend', since baseline
2152 // always extends everything it receives.
2153 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
2154 {
2155    unsigned int k;
2156    int sgn;
2157    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2158    if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
2159
2160    sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
2161    k = stbi_lrot(j->code_buffer, n);
2162    j->code_buffer = k & ~stbi__bmask[n];
2163    k &= stbi__bmask[n];
2164    j->code_bits -= n;
2165    return k + (stbi__jbias[n] & (sgn - 1));
2166 }
2167
2168 // get some unsigned bits
2169 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
2170 {
2171    unsigned int k;
2172    if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
2173    if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
2174    k = stbi_lrot(j->code_buffer, n);
2175    j->code_buffer = k & ~stbi__bmask[n];
2176    k &= stbi__bmask[n];
2177    j->code_bits -= n;
2178    return k;
2179 }
2180
2181 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
2182 {
2183    unsigned int k;
2184    if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
2185    if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing
2186    k = j->code_buffer;
2187    j->code_buffer <<= 1;
2188    --j->code_bits;
2189    return k & 0x80000000;
2190 }
2191
2192 // given a value that's at position X in the zigzag stream,
2193 // where does it appear in the 8x8 matrix coded as row-major?
2194 static const stbi_uc stbi__jpeg_dezigzag[64+15] =
2195 {
2196     0,  1,  8, 16,  9,  2,  3, 10,
2197    17, 24, 32, 25, 18, 11,  4,  5,
2198    12, 19, 26, 33, 40, 48, 41, 34,
2199    27, 20, 13,  6,  7, 14, 21, 28,
2200    35, 42, 49, 56, 57, 50, 43, 36,
2201    29, 22, 15, 23, 30, 37, 44, 51,
2202    58, 59, 52, 45, 38, 31, 39, 46,
2203    53, 60, 61, 54, 47, 55, 62, 63,
2204    // let corrupt input sample past end
2205    63, 63, 63, 63, 63, 63, 63, 63,
2206    63, 63, 63, 63, 63, 63, 63
2207 };
2208
2209 // decode one 64-entry block--
2210 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
2211 {
2212    int diff,dc,k;
2213    int t;
2214
2215    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2216    t = stbi__jpeg_huff_decode(j, hdc);
2217    if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
2218
2219    // 0 all the ac values now so we can do it 32-bits at a time
2220    memset(data,0,64*sizeof(data[0]));
2221
2222    diff = t ? stbi__extend_receive(j, t) : 0;
2223    if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG");
2224    dc = j->img_comp[b].dc_pred + diff;
2225    j->img_comp[b].dc_pred = dc;
2226    if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2227    data[0] = (short) (dc * dequant[0]);
2228
2229    // decode AC components, see JPEG spec
2230    k = 1;
2231    do {
2232       unsigned int zig;
2233       int c,r,s;
2234       if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2235       c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2236       r = fac[c];
2237       if (r) { // fast-AC path
2238          k += (r >> 4) & 15; // run
2239          s = r & 15; // combined length
2240          if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
2241          j->code_buffer <<= s;
2242          j->code_bits -= s;
2243          // decode into unzigzag'd location
2244          zig = stbi__jpeg_dezigzag[k++];
2245          data[zig] = (short) ((r >> 8) * dequant[zig]);
2246       } else {
2247          int rs = stbi__jpeg_huff_decode(j, hac);
2248          if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2249          s = rs & 15;
2250          r = rs >> 4;
2251          if (s == 0) {
2252             if (rs != 0xf0) break; // end block
2253             k += 16;
2254          } else {
2255             k += r;
2256             // decode into unzigzag'd location
2257             zig = stbi__jpeg_dezigzag[k++];
2258             data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
2259          }
2260       }
2261    } while (k < 64);
2262    return 1;
2263 }
2264
2265 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2266 {
2267    int diff,dc;
2268    int t;
2269    if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2270
2271    if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2272
2273    if (j->succ_high == 0) {
2274       // first scan for DC coefficient, must be first
2275       memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
2276       t = stbi__jpeg_huff_decode(j, hdc);
2277       if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2278       diff = t ? stbi__extend_receive(j, t) : 0;
2279
2280       if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG");
2281       dc = j->img_comp[b].dc_pred + diff;
2282       j->img_comp[b].dc_pred = dc;
2283       if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2284       data[0] = (short) (dc * (1 << j->succ_low));
2285    } else {
2286       // refinement scan for DC coefficient
2287       if (stbi__jpeg_get_bit(j))
2288          data[0] += (short) (1 << j->succ_low);
2289    }
2290    return 1;
2291 }
2292
2293 // @OPTIMIZE: store non-zigzagged during the decode passes,
2294 // and only de-zigzag when dequantizing
2295 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2296 {
2297    int k;
2298    if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2299
2300    if (j->succ_high == 0) {
2301       int shift = j->succ_low;
2302
2303       if (j->eob_run) {
2304          --j->eob_run;
2305          return 1;
2306       }
2307
2308       k = j->spec_start;
2309       do {
2310          unsigned int zig;
2311          int c,r,s;
2312          if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2313          c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2314          r = fac[c];
2315          if (r) { // fast-AC path
2316             k += (r >> 4) & 15; // run
2317             s = r & 15; // combined length
2318             if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
2319             j->code_buffer <<= s;
2320             j->code_bits -= s;
2321             zig = stbi__jpeg_dezigzag[k++];
2322             data[zig] = (short) ((r >> 8) * (1 << shift));
2323          } else {
2324             int rs = stbi__jpeg_huff_decode(j, hac);
2325             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2326             s = rs & 15;
2327             r = rs >> 4;
2328             if (s == 0) {
2329                if (r < 15) {
2330                   j->eob_run = (1 << r);
2331                   if (r)
2332                      j->eob_run += stbi__jpeg_get_bits(j, r);
2333                   --j->eob_run;
2334                   break;
2335                }
2336                k += 16;
2337             } else {
2338                k += r;
2339                zig = stbi__jpeg_dezigzag[k++];
2340                data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
2341             }
2342          }
2343       } while (k <= j->spec_end);
2344    } else {
2345       // refinement scan for these AC coefficients
2346
2347       short bit = (short) (1 << j->succ_low);
2348
2349       if (j->eob_run) {
2350          --j->eob_run;
2351          for (k = j->spec_start; k <= j->spec_end; ++k) {
2352             short *p = &data[stbi__jpeg_dezigzag[k]];
2353             if (*p != 0)
2354                if (stbi__jpeg_get_bit(j))
2355                   if ((*p & bit)==0) {
2356                      if (*p > 0)
2357                         *p += bit;
2358                      else
2359                         *p -= bit;
2360                   }
2361          }
2362       } else {
2363          k = j->spec_start;
2364          do {
2365             int r,s;
2366             int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2367             if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2368             s = rs & 15;
2369             r = rs >> 4;
2370             if (s == 0) {
2371                if (r < 15) {
2372                   j->eob_run = (1 << r) - 1;
2373                   if (r)
2374                      j->eob_run += stbi__jpeg_get_bits(j, r);
2375                   r = 64; // force end of block
2376                } else {
2377                   // r=15 s=0 should write 16 0s, so we just do
2378                   // a run of 15 0s and then write s (which is 0),
2379                   // so we don't have to do anything special here
2380                }
2381             } else {
2382                if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2383                // sign bit
2384                if (stbi__jpeg_get_bit(j))
2385                   s = bit;
2386                else
2387                   s = -bit;
2388             }
2389
2390             // advance by r
2391             while (k <= j->spec_end) {
2392                short *p = &data[stbi__jpeg_dezigzag[k++]];
2393                if (*p != 0) {
2394                   if (stbi__jpeg_get_bit(j))
2395                      if ((*p & bit)==0) {
2396                         if (*p > 0)
2397                            *p += bit;
2398                         else
2399                            *p -= bit;
2400                      }
2401                } else {
2402                   if (r == 0) {
2403                      *p = (short) s;
2404                      break;
2405                   }
2406                   --r;
2407                }
2408             }
2409          } while (k <= j->spec_end);
2410       }
2411    }
2412    return 1;
2413 }
2414
2415 // take a -128..127 value and stbi__clamp it and convert to 0..255
2416 stbi_inline static stbi_uc stbi__clamp(int x)
2417 {
2418    // trick to use a single test to catch both cases
2419    if ((unsigned int) x > 255) {
2420       if (x < 0) return 0;
2421       if (x > 255) return 255;
2422    }
2423    return (stbi_uc) x;
2424 }
2425
2426 #define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
2427 #define stbi__fsh(x)  ((x) * 4096)
2428
2429 // derived from jidctint -- DCT_ISLOW
2430 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2431    int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2432    p2 = s2;                                    \
2433    p3 = s6;                                    \
2434    p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
2435    t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
2436    t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
2437    p2 = s0;                                    \
2438    p3 = s4;                                    \
2439    t0 = stbi__fsh(p2+p3);                      \
2440    t1 = stbi__fsh(p2-p3);                      \
2441    x0 = t0+t3;                                 \
2442    x3 = t0-t3;                                 \
2443    x1 = t1+t2;                                 \
2444    x2 = t1-t2;                                 \
2445    t0 = s7;                                    \
2446    t1 = s5;                                    \
2447    t2 = s3;                                    \
2448    t3 = s1;                                    \
2449    p3 = t0+t2;                                 \
2450    p4 = t1+t3;                                 \
2451    p1 = t0+t3;                                 \
2452    p2 = t1+t2;                                 \
2453    p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
2454    t0 = t0*stbi__f2f( 0.298631336f);           \
2455    t1 = t1*stbi__f2f( 2.053119869f);           \
2456    t2 = t2*stbi__f2f( 3.072711026f);           \
2457    t3 = t3*stbi__f2f( 1.501321110f);           \
2458    p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
2459    p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
2460    p3 = p3*stbi__f2f(-1.961570560f);           \
2461    p4 = p4*stbi__f2f(-0.390180644f);           \
2462    t3 += p1+p4;                                \
2463    t2 += p2+p3;                                \
2464    t1 += p2+p4;                                \
2465    t0 += p1+p3;
2466
2467 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2468 {
2469    int i,val[64],*v=val;
2470    stbi_uc *o;
2471    short *d = data;
2472
2473    // columns
2474    for (i=0; i < 8; ++i,++d, ++v) {
2475       // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2476       if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2477            && d[40]==0 && d[48]==0 && d[56]==0) {
2478          //    no shortcut                 0     seconds
2479          //    (1|2|3|4|5|6|7)==0          0     seconds
2480          //    all separate               -0.047 seconds
2481          //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
2482          int dcterm = d[0]*4;
2483          v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2484       } else {
2485          STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2486          // constants scaled things up by 1<<12; let's bring them back
2487          // down, but keep 2 extra bits of precision
2488          x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2489          v[ 0] = (x0+t3) >> 10;
2490          v[56] = (x0-t3) >> 10;
2491          v[ 8] = (x1+t2) >> 10;
2492          v[48] = (x1-t2) >> 10;
2493          v[16] = (x2+t1) >> 10;
2494          v[40] = (x2-t1) >> 10;
2495          v[24] = (x3+t0) >> 10;
2496          v[32] = (x3-t0) >> 10;
2497       }
2498    }
2499
2500    for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2501       // no fast case since the first 1D IDCT spread components out
2502       STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2503       // constants scaled things up by 1<<12, plus we had 1<<2 from first
2504       // loop, plus horizontal and vertical each scale by sqrt(8) so together
2505       // we've got an extra 1<<3, so 1<<17 total we need to remove.
2506       // so we want to round that, which means adding 0.5 * 1<<17,
2507       // aka 65536. Also, we'll end up with -128 to 127 that we want
2508       // to encode as 0..255 by adding 128, so we'll add that before the shift
2509       x0 += 65536 + (128<<17);
2510       x1 += 65536 + (128<<17);
2511       x2 += 65536 + (128<<17);
2512       x3 += 65536 + (128<<17);
2513       // tried computing the shifts into temps, or'ing the temps to see
2514       // if any were out of range, but that was slower
2515       o[0] = stbi__clamp((x0+t3) >> 17);
2516       o[7] = stbi__clamp((x0-t3) >> 17);
2517       o[1] = stbi__clamp((x1+t2) >> 17);
2518       o[6] = stbi__clamp((x1-t2) >> 17);
2519       o[2] = stbi__clamp((x2+t1) >> 17);
2520       o[5] = stbi__clamp((x2-t1) >> 17);
2521       o[3] = stbi__clamp((x3+t0) >> 17);
2522       o[4] = stbi__clamp((x3-t0) >> 17);
2523    }
2524 }
2525
2526 #ifdef STBI_SSE2
2527 // sse2 integer IDCT. not the fastest possible implementation but it
2528 // produces bit-identical results to the generic C version so it's
2529 // fully "transparent".
2530 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2531 {
2532    // This is constructed to match our regular (generic) integer IDCT exactly.
2533    __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2534    __m128i tmp;
2535
2536    // dot product constant: even elems=x, odd elems=y
2537    #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2538
2539    // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
2540    // out(1) = c1[even]*x + c1[odd]*y
2541    #define dct_rot(out0,out1, x,y,c0,c1) \
2542       __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2543       __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2544       __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2545       __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2546       __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2547       __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2548
2549    // out = in << 12  (in 16-bit, out 32-bit)
2550    #define dct_widen(out, in) \
2551       __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2552       __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2553
2554    // wide add
2555    #define dct_wadd(out, a, b) \
2556       __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2557       __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2558
2559    // wide sub
2560    #define dct_wsub(out, a, b) \
2561       __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2562       __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2563
2564    // butterfly a/b, add bias, then shift by "s" and pack
2565    #define dct_bfly32o(out0, out1, a,b,bias,s) \
2566       { \
2567          __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2568          __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2569          dct_wadd(sum, abiased, b); \
2570          dct_wsub(dif, abiased, b); \
2571          out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2572          out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2573       }
2574
2575    // 8-bit interleave step (for transposes)
2576    #define dct_interleave8(a, b) \
2577       tmp = a; \
2578       a = _mm_unpacklo_epi8(a, b); \
2579       b = _mm_unpackhi_epi8(tmp, b)
2580
2581    // 16-bit interleave step (for transposes)
2582    #define dct_interleave16(a, b) \
2583       tmp = a; \
2584       a = _mm_unpacklo_epi16(a, b); \
2585       b = _mm_unpackhi_epi16(tmp, b)
2586
2587    #define dct_pass(bias,shift) \
2588       { \
2589          /* even part */ \
2590          dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2591          __m128i sum04 = _mm_add_epi16(row0, row4); \
2592          __m128i dif04 = _mm_sub_epi16(row0, row4); \
2593          dct_widen(t0e, sum04); \
2594          dct_widen(t1e, dif04); \
2595          dct_wadd(x0, t0e, t3e); \
2596          dct_wsub(x3, t0e, t3e); \
2597          dct_wadd(x1, t1e, t2e); \
2598          dct_wsub(x2, t1e, t2e); \
2599          /* odd part */ \
2600          dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2601          dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2602          __m128i sum17 = _mm_add_epi16(row1, row7); \
2603          __m128i sum35 = _mm_add_epi16(row3, row5); \
2604          dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2605          dct_wadd(x4, y0o, y4o); \
2606          dct_wadd(x5, y1o, y5o); \
2607          dct_wadd(x6, y2o, y5o); \
2608          dct_wadd(x7, y3o, y4o); \
2609          dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2610          dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2611          dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2612          dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2613       }
2614
2615    __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2616    __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2617    __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2618    __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2619    __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2620    __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2621    __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2622    __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2623
2624    // rounding biases in column/row passes, see stbi__idct_block for explanation.
2625    __m128i bias_0 = _mm_set1_epi32(512);
2626    __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2627
2628    // load
2629    row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2630    row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2631    row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2632    row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2633    row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2634    row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2635    row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2636    row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2637
2638    // column pass
2639    dct_pass(bias_0, 10);
2640
2641    {
2642       // 16bit 8x8 transpose pass 1
2643       dct_interleave16(row0, row4);
2644       dct_interleave16(row1, row5);
2645       dct_interleave16(row2, row6);
2646       dct_interleave16(row3, row7);
2647
2648       // transpose pass 2
2649       dct_interleave16(row0, row2);
2650       dct_interleave16(row1, row3);
2651       dct_interleave16(row4, row6);
2652       dct_interleave16(row5, row7);
2653
2654       // transpose pass 3
2655       dct_interleave16(row0, row1);
2656       dct_interleave16(row2, row3);
2657       dct_interleave16(row4, row5);
2658       dct_interleave16(row6, row7);
2659    }
2660
2661    // row pass
2662    dct_pass(bias_1, 17);
2663
2664    {
2665       // pack
2666       __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2667       __m128i p1 = _mm_packus_epi16(row2, row3);
2668       __m128i p2 = _mm_packus_epi16(row4, row5);
2669       __m128i p3 = _mm_packus_epi16(row6, row7);
2670
2671       // 8bit 8x8 transpose pass 1
2672       dct_interleave8(p0, p2); // a0e0a1e1...
2673       dct_interleave8(p1, p3); // c0g0c1g1...
2674
2675       // transpose pass 2
2676       dct_interleave8(p0, p1); // a0c0e0g0...
2677       dct_interleave8(p2, p3); // b0d0f0h0...
2678
2679       // transpose pass 3
2680       dct_interleave8(p0, p2); // a0b0c0d0...
2681       dct_interleave8(p1, p3); // a4b4c4d4...
2682
2683       // store
2684       _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2685       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2686       _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2687       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2688       _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2689       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2690       _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2691       _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2692    }
2693
2694 #undef dct_const
2695 #undef dct_rot
2696 #undef dct_widen
2697 #undef dct_wadd
2698 #undef dct_wsub
2699 #undef dct_bfly32o
2700 #undef dct_interleave8
2701 #undef dct_interleave16
2702 #undef dct_pass
2703 }
2704
2705 #endif // STBI_SSE2
2706
2707 #ifdef STBI_NEON
2708
2709 // NEON integer IDCT. should produce bit-identical
2710 // results to the generic C version.
2711 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2712 {
2713    int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2714
2715    int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2716    int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2717    int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2718    int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2719    int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2720    int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2721    int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2722    int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2723    int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2724    int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2725    int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2726    int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2727
2728 #define dct_long_mul(out, inq, coeff) \
2729    int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2730    int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2731
2732 #define dct_long_mac(out, acc, inq, coeff) \
2733    int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2734    int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2735
2736 #define dct_widen(out, inq) \
2737    int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2738    int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2739
2740 // wide add
2741 #define dct_wadd(out, a, b) \
2742    int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2743    int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2744
2745 // wide sub
2746 #define dct_wsub(out, a, b) \
2747    int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2748    int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2749
2750 // butterfly a/b, then shift using "shiftop" by "s" and pack
2751 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2752    { \
2753       dct_wadd(sum, a, b); \
2754       dct_wsub(dif, a, b); \
2755       out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2756       out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2757    }
2758
2759 #define dct_pass(shiftop, shift) \
2760    { \
2761       /* even part */ \
2762       int16x8_t sum26 = vaddq_s16(row2, row6); \
2763       dct_long_mul(p1e, sum26, rot0_0); \
2764       dct_long_mac(t2e, p1e, row6, rot0_1); \
2765       dct_long_mac(t3e, p1e, row2, rot0_2); \
2766       int16x8_t sum04 = vaddq_s16(row0, row4); \
2767       int16x8_t dif04 = vsubq_s16(row0, row4); \
2768       dct_widen(t0e, sum04); \
2769       dct_widen(t1e, dif04); \
2770       dct_wadd(x0, t0e, t3e); \
2771       dct_wsub(x3, t0e, t3e); \
2772       dct_wadd(x1, t1e, t2e); \
2773       dct_wsub(x2, t1e, t2e); \
2774       /* odd part */ \
2775       int16x8_t sum15 = vaddq_s16(row1, row5); \
2776       int16x8_t sum17 = vaddq_s16(row1, row7); \
2777       int16x8_t sum35 = vaddq_s16(row3, row5); \
2778       int16x8_t sum37 = vaddq_s16(row3, row7); \
2779       int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2780       dct_long_mul(p5o, sumodd, rot1_0); \
2781       dct_long_mac(p1o, p5o, sum17, rot1_1); \
2782       dct_long_mac(p2o, p5o, sum35, rot1_2); \
2783       dct_long_mul(p3o, sum37, rot2_0); \
2784       dct_long_mul(p4o, sum15, rot2_1); \
2785       dct_wadd(sump13o, p1o, p3o); \
2786       dct_wadd(sump24o, p2o, p4o); \
2787       dct_wadd(sump23o, p2o, p3o); \
2788       dct_wadd(sump14o, p1o, p4o); \
2789       dct_long_mac(x4, sump13o, row7, rot3_0); \
2790       dct_long_mac(x5, sump24o, row5, rot3_1); \
2791       dct_long_mac(x6, sump23o, row3, rot3_2); \
2792       dct_long_mac(x7, sump14o, row1, rot3_3); \
2793       dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2794       dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2795       dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2796       dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2797    }
2798
2799    // load
2800    row0 = vld1q_s16(data + 0*8);
2801    row1 = vld1q_s16(data + 1*8);
2802    row2 = vld1q_s16(data + 2*8);
2803    row3 = vld1q_s16(data + 3*8);
2804    row4 = vld1q_s16(data + 4*8);
2805    row5 = vld1q_s16(data + 5*8);
2806    row6 = vld1q_s16(data + 6*8);
2807    row7 = vld1q_s16(data + 7*8);
2808
2809    // add DC bias
2810    row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2811
2812    // column pass
2813    dct_pass(vrshrn_n_s32, 10);
2814
2815    // 16bit 8x8 transpose
2816    {
2817 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2818 // whether compilers actually get this is another story, sadly.
2819 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2820 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2821 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2822
2823       // pass 1
2824       dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2825       dct_trn16(row2, row3);
2826       dct_trn16(row4, row5);
2827       dct_trn16(row6, row7);
2828
2829       // pass 2
2830       dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2831       dct_trn32(row1, row3);
2832       dct_trn32(row4, row6);
2833       dct_trn32(row5, row7);
2834
2835       // pass 3
2836       dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2837       dct_trn64(row1, row5);
2838       dct_trn64(row2, row6);
2839       dct_trn64(row3, row7);
2840
2841 #undef dct_trn16
2842 #undef dct_trn32
2843 #undef dct_trn64
2844    }
2845
2846    // row pass
2847    // vrshrn_n_s32 only supports shifts up to 16, we need
2848    // 17. so do a non-rounding shift of 16 first then follow
2849    // up with a rounding shift by 1.
2850    dct_pass(vshrn_n_s32, 16);
2851
2852    {
2853       // pack and round
2854       uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2855       uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2856       uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2857       uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2858       uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2859       uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2860       uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2861       uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2862
2863       // again, these can translate into one instruction, but often don't.
2864 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2865 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2866 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2867
2868       // sadly can't use interleaved stores here since we only write
2869       // 8 bytes to each scan line!
2870
2871       // 8x8 8-bit transpose pass 1
2872       dct_trn8_8(p0, p1);
2873       dct_trn8_8(p2, p3);
2874       dct_trn8_8(p4, p5);
2875       dct_trn8_8(p6, p7);
2876
2877       // pass 2
2878       dct_trn8_16(p0, p2);
2879       dct_trn8_16(p1, p3);
2880       dct_trn8_16(p4, p6);
2881       dct_trn8_16(p5, p7);
2882
2883       // pass 3
2884       dct_trn8_32(p0, p4);
2885       dct_trn8_32(p1, p5);
2886       dct_trn8_32(p2, p6);
2887       dct_trn8_32(p3, p7);
2888
2889       // store
2890       vst1_u8(out, p0); out += out_stride;
2891       vst1_u8(out, p1); out += out_stride;
2892       vst1_u8(out, p2); out += out_stride;
2893       vst1_u8(out, p3); out += out_stride;
2894       vst1_u8(out, p4); out += out_stride;
2895       vst1_u8(out, p5); out += out_stride;
2896       vst1_u8(out, p6); out += out_stride;
2897       vst1_u8(out, p7);
2898
2899 #undef dct_trn8_8
2900 #undef dct_trn8_16
2901 #undef dct_trn8_32
2902    }
2903
2904 #undef dct_long_mul
2905 #undef dct_long_mac
2906 #undef dct_widen
2907 #undef dct_wadd
2908 #undef dct_wsub
2909 #undef dct_bfly32o
2910 #undef dct_pass
2911 }
2912
2913 #endif // STBI_NEON
2914
2915 #define STBI__MARKER_none  0xff
2916 // if there's a pending marker from the entropy stream, return that
2917 // otherwise, fetch from the stream and get a marker. if there's no
2918 // marker, return 0xff, which is never a valid marker value
2919 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2920 {
2921    stbi_uc x;
2922    if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2923    x = stbi__get8(j->s);
2924    if (x != 0xff) return STBI__MARKER_none;
2925    while (x == 0xff)
2926       x = stbi__get8(j->s); // consume repeated 0xff fill bytes
2927    return x;
2928 }
2929
2930 // in each scan, we'll have scan_n components, and the order
2931 // of the components is specified by order[]
2932 #define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
2933
2934 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2935 // the dc prediction
2936 static void stbi__jpeg_reset(stbi__jpeg *j)
2937 {
2938    j->code_bits = 0;
2939    j->code_buffer = 0;
2940    j->nomore = 0;
2941    j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
2942    j->marker = STBI__MARKER_none;
2943    j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2944    j->eob_run = 0;
2945    // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2946    // since we don't even allow 1<<30 pixels
2947 }
2948
2949 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2950 {
2951    stbi__jpeg_reset(z);
2952    if (!z->progressive) {
2953       if (z->scan_n == 1) {
2954          int i,j;
2955          STBI_SIMD_ALIGN(short, data[64]);
2956          int n = z->order[0];
2957          // non-interleaved data, we just need to process one block at a time,
2958          // in trivial scanline order
2959          // number of blocks to do just depends on how many actual "pixels" this
2960          // component has, independent of interleaved MCU blocking and such
2961          int w = (z->img_comp[n].x+7) >> 3;
2962          int h = (z->img_comp[n].y+7) >> 3;
2963          for (j=0; j < h; ++j) {
2964             for (i=0; i < w; ++i) {
2965                int ha = z->img_comp[n].ha;
2966                if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2967                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2968                // every data block is an MCU, so countdown the restart interval
2969                if (--z->todo <= 0) {
2970                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2971                   // if it's NOT a restart, then just bail, so we get corrupt data
2972                   // rather than no data
2973                   if (!STBI__RESTART(z->marker)) return 1;
2974                   stbi__jpeg_reset(z);
2975                }
2976             }
2977          }
2978          return 1;
2979       } else { // interleaved
2980          int i,j,k,x,y;
2981          STBI_SIMD_ALIGN(short, data[64]);
2982          for (j=0; j < z->img_mcu_y; ++j) {
2983             for (i=0; i < z->img_mcu_x; ++i) {
2984                // scan an interleaved mcu... process scan_n components in order
2985                for (k=0; k < z->scan_n; ++k) {
2986                   int n = z->order[k];
2987                   // scan out an mcu's worth of this component; that's just determined
2988                   // by the basic H and V specified for the component
2989                   for (y=0; y < z->img_comp[n].v; ++y) {
2990                      for (x=0; x < z->img_comp[n].h; ++x) {
2991                         int x2 = (i*z->img_comp[n].h + x)*8;
2992                         int y2 = (j*z->img_comp[n].v + y)*8;
2993                         int ha = z->img_comp[n].ha;
2994                         if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2995                         z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2996                      }
2997                   }
2998                }
2999                // after all interleaved components, that's an interleaved MCU,
3000                // so now count down the restart interval
3001                if (--z->todo <= 0) {
3002                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3003                   if (!STBI__RESTART(z->marker)) return 1;
3004                   stbi__jpeg_reset(z);
3005                }
3006             }
3007          }
3008          return 1;
3009       }
3010    } else {
3011       if (z->scan_n == 1) {
3012          int i,j;
3013          int n = z->order[0];
3014          // non-interleaved data, we just need to process one block at a time,
3015          // in trivial scanline order
3016          // number of blocks to do just depends on how many actual "pixels" this
3017          // component has, independent of interleaved MCU blocking and such
3018          int w = (z->img_comp[n].x+7) >> 3;
3019          int h = (z->img_comp[n].y+7) >> 3;
3020          for (j=0; j < h; ++j) {
3021             for (i=0; i < w; ++i) {
3022                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3023                if (z->spec_start == 0) {
3024                   if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3025                      return 0;
3026                } else {
3027                   int ha = z->img_comp[n].ha;
3028                   if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
3029                      return 0;
3030                }
3031                // every data block is an MCU, so countdown the restart interval
3032                if (--z->todo <= 0) {
3033                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3034                   if (!STBI__RESTART(z->marker)) return 1;
3035                   stbi__jpeg_reset(z);
3036                }
3037             }
3038          }
3039          return 1;
3040       } else { // interleaved
3041          int i,j,k,x,y;
3042          for (j=0; j < z->img_mcu_y; ++j) {
3043             for (i=0; i < z->img_mcu_x; ++i) {
3044                // scan an interleaved mcu... process scan_n components in order
3045                for (k=0; k < z->scan_n; ++k) {
3046                   int n = z->order[k];
3047                   // scan out an mcu's worth of this component; that's just determined
3048                   // by the basic H and V specified for the component
3049                   for (y=0; y < z->img_comp[n].v; ++y) {
3050                      for (x=0; x < z->img_comp[n].h; ++x) {
3051                         int x2 = (i*z->img_comp[n].h + x);
3052                         int y2 = (j*z->img_comp[n].v + y);
3053                         short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
3054                         if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
3055                            return 0;
3056                      }
3057                   }
3058                }
3059                // after all interleaved components, that's an interleaved MCU,
3060                // so now count down the restart interval
3061                if (--z->todo <= 0) {
3062                   if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
3063                   if (!STBI__RESTART(z->marker)) return 1;
3064                   stbi__jpeg_reset(z);
3065                }
3066             }
3067          }
3068          return 1;
3069       }
3070    }
3071 }
3072
3073 static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
3074 {
3075    int i;
3076    for (i=0; i < 64; ++i)
3077       data[i] *= dequant[i];
3078 }
3079
3080 static void stbi__jpeg_finish(stbi__jpeg *z)
3081 {
3082    if (z->progressive) {
3083       // dequantize and idct the data
3084       int i,j,n;
3085       for (n=0; n < z->s->img_n; ++n) {
3086          int w = (z->img_comp[n].x+7) >> 3;
3087          int h = (z->img_comp[n].y+7) >> 3;
3088          for (j=0; j < h; ++j) {
3089             for (i=0; i < w; ++i) {
3090                short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
3091                stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
3092                z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
3093             }
3094          }
3095       }
3096    }
3097 }
3098
3099 static int stbi__process_marker(stbi__jpeg *z, int m)
3100 {
3101    int L;
3102    switch (m) {
3103       case STBI__MARKER_none: // no marker found
3104          return stbi__err("expected marker","Corrupt JPEG");
3105
3106       case 0xDD: // DRI - specify restart interval
3107          if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
3108          z->restart_interval = stbi__get16be(z->s);
3109          return 1;
3110
3111       case 0xDB: // DQT - define quantization table
3112          L = stbi__get16be(z->s)-2;
3113          while (L > 0) {
3114             int q = stbi__get8(z->s);
3115             int p = q >> 4, sixteen = (p != 0);
3116             int t = q & 15,i;
3117             if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
3118             if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
3119
3120             for (i=0; i < 64; ++i)
3121                z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
3122             L -= (sixteen ? 129 : 65);
3123          }
3124          return L==0;
3125
3126       case 0xC4: // DHT - define huffman table
3127          L = stbi__get16be(z->s)-2;
3128          while (L > 0) {
3129             stbi_uc *v;
3130             int sizes[16],i,n=0;
3131             int q = stbi__get8(z->s);
3132             int tc = q >> 4;
3133             int th = q & 15;
3134             if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
3135             for (i=0; i < 16; ++i) {
3136                sizes[i] = stbi__get8(z->s);
3137                n += sizes[i];
3138             }
3139             if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values!
3140             L -= 17;
3141             if (tc == 0) {
3142                if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
3143                v = z->huff_dc[th].values;
3144             } else {
3145                if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
3146                v = z->huff_ac[th].values;
3147             }
3148             for (i=0; i < n; ++i)
3149                v[i] = stbi__get8(z->s);
3150             if (tc != 0)
3151                stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
3152             L -= n;
3153          }
3154          return L==0;
3155    }
3156
3157    // check for comment block or APP blocks
3158    if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
3159       L = stbi__get16be(z->s);
3160       if (L < 2) {
3161          if (m == 0xFE)
3162             return stbi__err("bad COM len","Corrupt JPEG");
3163          else
3164             return stbi__err("bad APP len","Corrupt JPEG");
3165       }
3166       L -= 2;
3167
3168       if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
3169          static const unsigned char tag[5] = {'J','F','I','F','\0'};
3170          int ok = 1;
3171          int i;
3172          for (i=0; i < 5; ++i)
3173             if (stbi__get8(z->s) != tag[i])
3174                ok = 0;
3175          L -= 5;
3176          if (ok)
3177             z->jfif = 1;
3178       } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
3179          static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
3180          int ok = 1;
3181          int i;
3182          for (i=0; i < 6; ++i)
3183             if (stbi__get8(z->s) != tag[i])
3184                ok = 0;
3185          L -= 6;
3186          if (ok) {
3187             stbi__get8(z->s); // version
3188             stbi__get16be(z->s); // flags0
3189             stbi__get16be(z->s); // flags1
3190             z->app14_color_transform = stbi__get8(z->s); // color transform
3191             L -= 6;
3192          }
3193       }
3194
3195       stbi__skip(z->s, L);
3196       return 1;
3197    }
3198
3199    return stbi__err("unknown marker","Corrupt JPEG");
3200 }
3201
3202 // after we see SOS
3203 static int stbi__process_scan_header(stbi__jpeg *z)
3204 {
3205    int i;
3206    int Ls = stbi__get16be(z->s);
3207    z->scan_n = stbi__get8(z->s);
3208    if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
3209    if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
3210    for (i=0; i < z->scan_n; ++i) {
3211       int id = stbi__get8(z->s), which;
3212       int q = stbi__get8(z->s);
3213       for (which = 0; which < z->s->img_n; ++which)
3214          if (z->img_comp[which].id == id)
3215             break;
3216       if (which == z->s->img_n) return 0; // no match
3217       z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
3218       z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
3219       z->order[i] = which;
3220    }
3221
3222    {
3223       int aa;
3224       z->spec_start = stbi__get8(z->s);
3225       z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
3226       aa = stbi__get8(z->s);
3227       z->succ_high = (aa >> 4);
3228       z->succ_low  = (aa & 15);
3229       if (z->progressive) {
3230          if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
3231             return stbi__err("bad SOS", "Corrupt JPEG");
3232       } else {
3233          if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
3234          if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
3235          z->spec_end = 63;
3236       }
3237    }
3238
3239    return 1;
3240 }
3241
3242 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
3243 {
3244    int i;
3245    for (i=0; i < ncomp; ++i) {
3246       if (z->img_comp[i].raw_data) {
3247          STBI_FREE(z->img_comp[i].raw_data);
3248          z->img_comp[i].raw_data = NULL;
3249          z->img_comp[i].data = NULL;
3250       }
3251       if (z->img_comp[i].raw_coeff) {
3252          STBI_FREE(z->img_comp[i].raw_coeff);
3253          z->img_comp[i].raw_coeff = 0;
3254          z->img_comp[i].coeff = 0;
3255       }
3256       if (z->img_comp[i].linebuf) {
3257          STBI_FREE(z->img_comp[i].linebuf);
3258          z->img_comp[i].linebuf = NULL;
3259       }
3260    }
3261    return why;
3262 }
3263
3264 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
3265 {
3266    stbi__context *s = z->s;
3267    int Lf,p,i,q, h_max=1,v_max=1,c;
3268    Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
3269    p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
3270    s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
3271    s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
3272    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3273    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
3274    c = stbi__get8(s);
3275    if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
3276    s->img_n = c;
3277    for (i=0; i < c; ++i) {
3278       z->img_comp[i].data = NULL;
3279       z->img_comp[i].linebuf = NULL;
3280    }
3281
3282    if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
3283
3284    z->rgb = 0;
3285    for (i=0; i < s->img_n; ++i) {
3286       static const unsigned char rgb[3] = { 'R', 'G', 'B' };
3287       z->img_comp[i].id = stbi__get8(s);
3288       if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
3289          ++z->rgb;
3290       q = stbi__get8(s);
3291       z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
3292       z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
3293       z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
3294    }
3295
3296    if (scan != STBI__SCAN_load) return 1;
3297
3298    if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3299
3300    for (i=0; i < s->img_n; ++i) {
3301       if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3302       if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3303    }
3304
3305    // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
3306    // and I've never seen a non-corrupted JPEG file actually use them
3307    for (i=0; i < s->img_n; ++i) {
3308       if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
3309       if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
3310    }
3311
3312    // compute interleaved mcu info
3313    z->img_h_max = h_max;
3314    z->img_v_max = v_max;
3315    z->img_mcu_w = h_max * 8;
3316    z->img_mcu_h = v_max * 8;
3317    // these sizes can't be more than 17 bits
3318    z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
3319    z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
3320
3321    for (i=0; i < s->img_n; ++i) {
3322       // number of effective pixels (e.g. for non-interleaved MCU)
3323       z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
3324       z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
3325       // to simplify generation, we'll allocate enough memory to decode
3326       // the bogus oversized data from using interleaved MCUs and their
3327       // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3328       // discard the extra data until colorspace conversion
3329       //
3330       // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3331       // so these muls can't overflow with 32-bit ints (which we require)
3332       z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3333       z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3334       z->img_comp[i].coeff = 0;
3335       z->img_comp[i].raw_coeff = 0;
3336       z->img_comp[i].linebuf = NULL;
3337       z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3338       if (z->img_comp[i].raw_data == NULL)
3339          return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3340       // align blocks for idct using mmx/sse
3341       z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
3342       if (z->progressive) {
3343          // w2, h2 are multiples of 8 (see above)
3344          z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3345          z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3346          z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3347          if (z->img_comp[i].raw_coeff == NULL)
3348             return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3349          z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
3350       }
3351    }
3352
3353    return 1;
3354 }
3355
3356 // use comparisons since in some cases we handle more than one case (e.g. SOF)
3357 #define stbi__DNL(x)         ((x) == 0xdc)
3358 #define stbi__SOI(x)         ((x) == 0xd8)
3359 #define stbi__EOI(x)         ((x) == 0xd9)
3360 #define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3361 #define stbi__SOS(x)         ((x) == 0xda)
3362
3363 #define stbi__SOF_progressive(x)   ((x) == 0xc2)
3364
3365 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3366 {
3367    int m;
3368    z->jfif = 0;
3369    z->app14_color_transform = -1; // valid values are 0,1,2
3370    z->marker = STBI__MARKER_none; // initialize cached marker to empty
3371    m = stbi__get_marker(z);
3372    if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
3373    if (scan == STBI__SCAN_type) return 1;
3374    m = stbi__get_marker(z);
3375    while (!stbi__SOF(m)) {
3376       if (!stbi__process_marker(z,m)) return 0;
3377       m = stbi__get_marker(z);
3378       while (m == STBI__MARKER_none) {
3379          // some files have extra padding after their blocks, so ok, we'll scan
3380          if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3381          m = stbi__get_marker(z);
3382       }
3383    }
3384    z->progressive = stbi__SOF_progressive(m);
3385    if (!stbi__process_frame_header(z, scan)) return 0;
3386    return 1;
3387 }
3388
3389 static stbi_uc stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
3390 {
3391    // some JPEGs have junk at end, skip over it but if we find what looks
3392    // like a valid marker, resume there
3393    while (!stbi__at_eof(j->s)) {
3394       stbi_uc x = stbi__get8(j->s);
3395       while (x == 0xff) { // might be a marker
3396          if (stbi__at_eof(j->s)) return STBI__MARKER_none;
3397          x = stbi__get8(j->s);
3398          if (x != 0x00 && x != 0xff) {
3399             // not a stuffed zero or lead-in to another marker, looks
3400             // like an actual marker, return it
3401             return x;
3402          }
3403          // stuffed zero has x=0 now which ends the loop, meaning we go
3404          // back to regular scan loop.
3405          // repeated 0xff keeps trying to read the next byte of the marker.
3406       }
3407    }
3408    return STBI__MARKER_none;
3409 }
3410
3411 // decode image to YCbCr format
3412 static int stbi__decode_jpeg_image(stbi__jpeg *j)
3413 {
3414    int m;
3415    for (m = 0; m < 4; m++) {
3416       j->img_comp[m].raw_data = NULL;
3417       j->img_comp[m].raw_coeff = NULL;
3418    }
3419    j->restart_interval = 0;
3420    if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3421    m = stbi__get_marker(j);
3422    while (!stbi__EOI(m)) {
3423       if (stbi__SOS(m)) {
3424          if (!stbi__process_scan_header(j)) return 0;
3425          if (!stbi__parse_entropy_coded_data(j)) return 0;
3426          if (j->marker == STBI__MARKER_none ) {
3427          j->marker = stbi__skip_jpeg_junk_at_end(j);
3428             // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3429          }
3430          m = stbi__get_marker(j);
3431          if (STBI__RESTART(m))
3432             m = stbi__get_marker(j);
3433       } else if (stbi__DNL(m)) {
3434          int Ld = stbi__get16be(j->s);
3435          stbi__uint32 NL = stbi__get16be(j->s);
3436          if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
3437          if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
3438          m = stbi__get_marker(j);
3439       } else {
3440          if (!stbi__process_marker(j, m)) return 1;
3441          m = stbi__get_marker(j);
3442       }
3443    }
3444    if (j->progressive)
3445       stbi__jpeg_finish(j);
3446    return 1;
3447 }
3448
3449 // static jfif-centered resampling (across block boundaries)
3450
3451 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3452                                     int w, int hs);
3453
3454 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3455
3456 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3457 {
3458    STBI_NOTUSED(out);
3459    STBI_NOTUSED(in_far);
3460    STBI_NOTUSED(w);
3461    STBI_NOTUSED(hs);
3462    return in_near;
3463 }
3464
3465 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3466 {
3467    // need to generate two samples vertically for every one in input
3468    int i;
3469    STBI_NOTUSED(hs);
3470    for (i=0; i < w; ++i)
3471       out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
3472    return out;
3473 }
3474
3475 static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3476 {
3477    // need to generate two samples horizontally for every one in input
3478    int i;
3479    stbi_uc *input = in_near;
3480
3481    if (w == 1) {
3482       // if only one sample, can't do any interpolation
3483       out[0] = out[1] = input[0];
3484       return out;
3485    }
3486
3487    out[0] = input[0];
3488    out[1] = stbi__div4(input[0]*3 + input[1] + 2);
3489    for (i=1; i < w-1; ++i) {
3490       int n = 3*input[i]+2;
3491       out[i*2+0] = stbi__div4(n+input[i-1]);
3492       out[i*2+1] = stbi__div4(n+input[i+1]);
3493    }
3494    out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
3495    out[i*2+1] = input[w-1];
3496
3497    STBI_NOTUSED(in_far);
3498    STBI_NOTUSED(hs);
3499
3500    return out;
3501 }
3502
3503 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3504
3505 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3506 {
3507    // need to generate 2x2 samples for every one in input
3508    int i,t0,t1;
3509    if (w == 1) {
3510       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3511       return out;
3512    }
3513
3514    t1 = 3*in_near[0] + in_far[0];
3515    out[0] = stbi__div4(t1+2);
3516    for (i=1; i < w; ++i) {
3517       t0 = t1;
3518       t1 = 3*in_near[i]+in_far[i];
3519       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3520       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3521    }
3522    out[w*2-1] = stbi__div4(t1+2);
3523
3524    STBI_NOTUSED(hs);
3525
3526    return out;
3527 }
3528
3529 #if defined(STBI_SSE2) || defined(STBI_NEON)
3530 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3531 {
3532    // need to generate 2x2 samples for every one in input
3533    int i=0,t0,t1;
3534
3535    if (w == 1) {
3536       out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3537       return out;
3538    }
3539
3540    t1 = 3*in_near[0] + in_far[0];
3541    // process groups of 8 pixels for as long as we can.
3542    // note we can't handle the last pixel in a row in this loop
3543    // because we need to handle the filter boundary conditions.
3544    for (; i < ((w-1) & ~7); i += 8) {
3545 #if defined(STBI_SSE2)
3546       // load and perform the vertical filtering pass
3547       // this uses 3*x + y = 4*x + (y - x)
3548       __m128i zero  = _mm_setzero_si128();
3549       __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
3550       __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3551       __m128i farw  = _mm_unpacklo_epi8(farb, zero);
3552       __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3553       __m128i diff  = _mm_sub_epi16(farw, nearw);
3554       __m128i nears = _mm_slli_epi16(nearw, 2);
3555       __m128i curr  = _mm_add_epi16(nears, diff); // current row
3556
3557       // horizontal filter works the same based on shifted vers of current
3558       // row. "prev" is current row shifted right by 1 pixel; we need to
3559       // insert the previous pixel value (from t1).
3560       // "next" is current row shifted left by 1 pixel, with first pixel
3561       // of next block of 8 pixels added in.
3562       __m128i prv0 = _mm_slli_si128(curr, 2);
3563       __m128i nxt0 = _mm_srli_si128(curr, 2);
3564       __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3565       __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3566
3567       // horizontal filter, polyphase implementation since it's convenient:
3568       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3569       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3570       // note the shared term.
3571       __m128i bias  = _mm_set1_epi16(8);
3572       __m128i curs = _mm_slli_epi16(curr, 2);
3573       __m128i prvd = _mm_sub_epi16(prev, curr);
3574       __m128i nxtd = _mm_sub_epi16(next, curr);
3575       __m128i curb = _mm_add_epi16(curs, bias);
3576       __m128i even = _mm_add_epi16(prvd, curb);
3577       __m128i odd  = _mm_add_epi16(nxtd, curb);
3578
3579       // interleave even and odd pixels, then undo scaling.
3580       __m128i int0 = _mm_unpacklo_epi16(even, odd);
3581       __m128i int1 = _mm_unpackhi_epi16(even, odd);
3582       __m128i de0  = _mm_srli_epi16(int0, 4);
3583       __m128i de1  = _mm_srli_epi16(int1, 4);
3584
3585       // pack and write output
3586       __m128i outv = _mm_packus_epi16(de0, de1);
3587       _mm_storeu_si128((__m128i *) (out + i*2), outv);
3588 #elif defined(STBI_NEON)
3589       // load and perform the vertical filtering pass
3590       // this uses 3*x + y = 4*x + (y - x)
3591       uint8x8_t farb  = vld1_u8(in_far + i);
3592       uint8x8_t nearb = vld1_u8(in_near + i);
3593       int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3594       int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3595       int16x8_t curr  = vaddq_s16(nears, diff); // current row
3596
3597       // horizontal filter works the same based on shifted vers of current
3598       // row. "prev" is current row shifted right by 1 pixel; we need to
3599       // insert the previous pixel value (from t1).
3600       // "next" is current row shifted left by 1 pixel, with first pixel
3601       // of next block of 8 pixels added in.
3602       int16x8_t prv0 = vextq_s16(curr, curr, 7);
3603       int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3604       int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3605       int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3606
3607       // horizontal filter, polyphase implementation since it's convenient:
3608       // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3609       // odd  pixels = 3*cur + next = cur*4 + (next - cur)
3610       // note the shared term.
3611       int16x8_t curs = vshlq_n_s16(curr, 2);
3612       int16x8_t prvd = vsubq_s16(prev, curr);
3613       int16x8_t nxtd = vsubq_s16(next, curr);
3614       int16x8_t even = vaddq_s16(curs, prvd);
3615       int16x8_t odd  = vaddq_s16(curs, nxtd);
3616
3617       // undo scaling and round, then store with even/odd phases interleaved
3618       uint8x8x2_t o;
3619       o.val[0] = vqrshrun_n_s16(even, 4);
3620       o.val[1] = vqrshrun_n_s16(odd,  4);
3621       vst2_u8(out + i*2, o);
3622 #endif
3623
3624       // "previous" value for next iter
3625       t1 = 3*in_near[i+7] + in_far[i+7];
3626    }
3627
3628    t0 = t1;
3629    t1 = 3*in_near[i] + in_far[i];
3630    out[i*2] = stbi__div16(3*t1 + t0 + 8);
3631
3632    for (++i; i < w; ++i) {
3633       t0 = t1;
3634       t1 = 3*in_near[i]+in_far[i];
3635       out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3636       out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
3637    }
3638    out[w*2-1] = stbi__div4(t1+2);
3639
3640    STBI_NOTUSED(hs);
3641
3642    return out;
3643 }
3644 #endif
3645
3646 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3647 {
3648    // resample with nearest-neighbor
3649    int i,j;
3650    STBI_NOTUSED(in_far);
3651    for (i=0; i < w; ++i)
3652       for (j=0; j < hs; ++j)
3653          out[i*hs+j] = in_near[i];
3654    return out;
3655 }
3656
3657 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3658 // to make sure the code produces the same results in both SIMD and scalar
3659 #define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
3660 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3661 {
3662    int i;
3663    for (i=0; i < count; ++i) {
3664       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3665       int r,g,b;
3666       int cr = pcr[i] - 128;
3667       int cb = pcb[i] - 128;
3668       r = y_fixed +  cr* stbi__float2fixed(1.40200f);
3669       g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3670       b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
3671       r >>= 20;
3672       g >>= 20;
3673       b >>= 20;
3674       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3675       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3676       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3677       out[0] = (stbi_uc)r;
3678       out[1] = (stbi_uc)g;
3679       out[2] = (stbi_uc)b;
3680       out[3] = 255;
3681       out += step;
3682    }
3683 }
3684
3685 #if defined(STBI_SSE2) || defined(STBI_NEON)
3686 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3687 {
3688    int i = 0;
3689
3690 #ifdef STBI_SSE2
3691    // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3692    // it's useful in practice (you wouldn't use it for textures, for example).
3693    // so just accelerate step == 4 case.
3694    if (step == 4) {
3695       // this is a fairly straightforward implementation and not super-optimized.
3696       __m128i signflip  = _mm_set1_epi8(-0x80);
3697       __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
3698       __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3699       __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3700       __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
3701       __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3702       __m128i xw = _mm_set1_epi16(255); // alpha channel
3703
3704       for (; i+7 < count; i += 8) {
3705          // load
3706          __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3707          __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3708          __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3709          __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3710          __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3711
3712          // unpack to short (and left-shift cr, cb by 8)
3713          __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
3714          __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3715          __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3716
3717          // color transform
3718          __m128i yws = _mm_srli_epi16(yw, 4);
3719          __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3720          __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3721          __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3722          __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3723          __m128i rws = _mm_add_epi16(cr0, yws);
3724          __m128i gwt = _mm_add_epi16(cb0, yws);
3725          __m128i bws = _mm_add_epi16(yws, cb1);
3726          __m128i gws = _mm_add_epi16(gwt, cr1);
3727
3728          // descale
3729          __m128i rw = _mm_srai_epi16(rws, 4);
3730          __m128i bw = _mm_srai_epi16(bws, 4);
3731          __m128i gw = _mm_srai_epi16(gws, 4);
3732
3733          // back to byte, set up for transpose
3734          __m128i brb = _mm_packus_epi16(rw, bw);
3735          __m128i gxb = _mm_packus_epi16(gw, xw);
3736
3737          // transpose to interleave channels
3738          __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3739          __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3740          __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3741          __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3742
3743          // store
3744          _mm_storeu_si128((__m128i *) (out + 0), o0);
3745          _mm_storeu_si128((__m128i *) (out + 16), o1);
3746          out += 32;
3747       }
3748    }
3749 #endif
3750
3751 #ifdef STBI_NEON
3752    // in this version, step=3 support would be easy to add. but is there demand?
3753    if (step == 4) {
3754       // this is a fairly straightforward implementation and not super-optimized.
3755       uint8x8_t signflip = vdup_n_u8(0x80);
3756       int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
3757       int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3758       int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3759       int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
3760
3761       for (; i+7 < count; i += 8) {
3762          // load
3763          uint8x8_t y_bytes  = vld1_u8(y + i);
3764          uint8x8_t cr_bytes = vld1_u8(pcr + i);
3765          uint8x8_t cb_bytes = vld1_u8(pcb + i);
3766          int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3767          int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3768
3769          // expand to s16
3770          int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3771          int16x8_t crw = vshll_n_s8(cr_biased, 7);
3772          int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3773
3774          // color transform
3775          int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3776          int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3777          int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3778          int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3779          int16x8_t rws = vaddq_s16(yws, cr0);
3780          int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3781          int16x8_t bws = vaddq_s16(yws, cb1);
3782
3783          // undo scaling, round, convert to byte
3784          uint8x8x4_t o;
3785          o.val[0] = vqrshrun_n_s16(rws, 4);
3786          o.val[1] = vqrshrun_n_s16(gws, 4);
3787          o.val[2] = vqrshrun_n_s16(bws, 4);
3788          o.val[3] = vdup_n_u8(255);
3789
3790          // store, interleaving r/g/b/a
3791          vst4_u8(out, o);
3792          out += 8*4;
3793       }
3794    }
3795 #endif
3796
3797    for (; i < count; ++i) {
3798       int y_fixed = (y[i] << 20) + (1<<19); // rounding
3799       int r,g,b;
3800       int cr = pcr[i] - 128;
3801       int cb = pcb[i] - 128;
3802       r = y_fixed + cr* stbi__float2fixed(1.40200f);
3803       g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
3804       b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
3805       r >>= 20;
3806       g >>= 20;
3807       b >>= 20;
3808       if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3809       if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3810       if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3811       out[0] = (stbi_uc)r;
3812       out[1] = (stbi_uc)g;
3813       out[2] = (stbi_uc)b;
3814       out[3] = 255;
3815       out += step;
3816    }
3817 }
3818 #endif
3819
3820 // set up the kernels
3821 static void stbi__setup_jpeg(stbi__jpeg *j)
3822 {
3823    j->idct_block_kernel = stbi__idct_block;
3824    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3825    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3826
3827 #ifdef STBI_SSE2
3828    if (stbi__sse2_available()) {
3829       j->idct_block_kernel = stbi__idct_simd;
3830       j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3831       j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3832    }
3833 #endif
3834
3835 #ifdef STBI_NEON
3836    j->idct_block_kernel = stbi__idct_simd;
3837    j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3838    j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3839 #endif
3840 }
3841
3842 // clean up the temporary component buffers
3843 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3844 {
3845    stbi__free_jpeg_components(j, j->s->img_n, 0);
3846 }
3847
3848 typedef struct
3849 {
3850    resample_row_func resample;
3851    stbi_uc *line0,*line1;
3852    int hs,vs;   // expansion factor in each axis
3853    int w_lores; // horizontal pixels pre-expansion
3854    int ystep;   // how far through vertical expansion we are
3855    int ypos;    // which pre-expansion row we're on
3856 } stbi__resample;
3857
3858 // fast 0..255 * 0..255 => 0..255 rounded multiplication
3859 static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
3860 {
3861    unsigned int t = x*y + 128;
3862    return (stbi_uc) ((t + (t >>8)) >> 8);
3863 }
3864
3865 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3866 {
3867    int n, decode_n, is_rgb;
3868    z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3869
3870    // validate req_comp
3871    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3872
3873    // load a jpeg image from whichever source, but leave in YCbCr format
3874    if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3875
3876    // determine actual number of components to generate
3877    n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
3878
3879    is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
3880
3881    if (z->s->img_n == 3 && n < 3 && !is_rgb)
3882       decode_n = 1;
3883    else
3884       decode_n = z->s->img_n;
3885
3886    // nothing to do if no components requested; check this now to avoid
3887    // accessing uninitialized coutput[0] later
3888    if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
3889
3890    // resample and color-convert
3891    {
3892       int k;
3893       unsigned int i,j;
3894       stbi_uc *output;
3895       stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
3896
3897       stbi__resample res_comp[4];
3898
3899       for (k=0; k < decode_n; ++k) {
3900          stbi__resample *r = &res_comp[k];
3901
3902          // allocate line buffer big enough for upsampling off the edges
3903          // with upsample factor of 4
3904          z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3905          if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3906
3907          r->hs      = z->img_h_max / z->img_comp[k].h;
3908          r->vs      = z->img_v_max / z->img_comp[k].v;
3909          r->ystep   = r->vs >> 1;
3910          r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3911          r->ypos    = 0;
3912          r->line0   = r->line1 = z->img_comp[k].data;
3913
3914          if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3915          else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3916          else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3917          else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3918          else                               r->resample = stbi__resample_row_generic;
3919       }
3920
3921       // can't error after this so, this is safe
3922       output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3923       if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3924
3925       // now go ahead and resample
3926       for (j=0; j < z->s->img_y; ++j) {
3927          stbi_uc *out = output + n * z->s->img_x * j;
3928          for (k=0; k < decode_n; ++k) {
3929             stbi__resample *r = &res_comp[k];
3930             int y_bot = r->ystep >= (r->vs >> 1);
3931             coutput[k] = r->resample(z->img_comp[k].linebuf,
3932                                      y_bot ? r->line1 : r->line0,
3933                                      y_bot ? r->line0 : r->line1,
3934                                      r->w_lores, r->hs);
3935             if (++r->ystep >= r->vs) {
3936                r->ystep = 0;
3937                r->line0 = r->line1;
3938                if (++r->ypos < z->img_comp[k].y)
3939                   r->line1 += z->img_comp[k].w2;
3940             }
3941          }
3942          if (n >= 3) {
3943             stbi_uc *y = coutput[0];
3944             if (z->s->img_n == 3) {
3945                if (is_rgb) {
3946                   for (i=0; i < z->s->img_x; ++i) {
3947                      out[0] = y[i];
3948                      out[1] = coutput[1][i];
3949                      out[2] = coutput[2][i];
3950                      out[3] = 255;
3951                      out += n;
3952                   }
3953                } else {
3954                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3955                }
3956             } else if (z->s->img_n == 4) {
3957                if (z->app14_color_transform == 0) { // CMYK
3958                   for (i=0; i < z->s->img_x; ++i) {
3959                      stbi_uc m = coutput[3][i];
3960                      out[0] = stbi__blinn_8x8(coutput[0][i], m);
3961                      out[1] = stbi__blinn_8x8(coutput[1][i], m);
3962                      out[2] = stbi__blinn_8x8(coutput[2][i], m);
3963                      out[3] = 255;
3964                      out += n;
3965                   }
3966                } else if (z->app14_color_transform == 2) { // YCCK
3967                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3968                   for (i=0; i < z->s->img_x; ++i) {
3969                      stbi_uc m = coutput[3][i];
3970                      out[0] = stbi__blinn_8x8(255 - out[0], m);
3971                      out[1] = stbi__blinn_8x8(255 - out[1], m);
3972                      out[2] = stbi__blinn_8x8(255 - out[2], m);
3973                      out += n;
3974                   }
3975                } else { // YCbCr + alpha?  Ignore the fourth channel for now
3976                   z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3977                }
3978             } else
3979                for (i=0; i < z->s->img_x; ++i) {
3980                   out[0] = out[1] = out[2] = y[i];
3981                   out[3] = 255; // not used if n==3
3982                   out += n;
3983                }
3984          } else {
3985             if (is_rgb) {
3986                if (n == 1)
3987                   for (i=0; i < z->s->img_x; ++i)
3988                      *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3989                else {
3990                   for (i=0; i < z->s->img_x; ++i, out += 2) {
3991                      out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
3992                      out[1] = 255;
3993                   }
3994                }
3995             } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
3996                for (i=0; i < z->s->img_x; ++i) {
3997                   stbi_uc m = coutput[3][i];
3998                   stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
3999                   stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
4000                   stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
4001                   out[0] = stbi__compute_y(r, g, b);
4002                   out[1] = 255;
4003                   out += n;
4004                }
4005             } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
4006                for (i=0; i < z->s->img_x; ++i) {
4007                   out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
4008                   out[1] = 255;
4009                   out += n;
4010                }
4011             } else {
4012                stbi_uc *y = coutput[0];
4013                if (n == 1)
4014                   for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
4015                else
4016                   for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
4017             }
4018          }
4019       }
4020       stbi__cleanup_jpeg(z);
4021       *out_x = z->s->img_x;
4022       *out_y = z->s->img_y;
4023       if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
4024       return output;
4025    }
4026 }
4027
4028 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
4029 {
4030    unsigned char* result;
4031    stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
4032    if (!j) return stbi__errpuc("outofmem", "Out of memory");
4033    memset(j, 0, sizeof(stbi__jpeg));
4034    STBI_NOTUSED(ri);
4035    j->s = s;
4036    stbi__setup_jpeg(j);
4037    result = load_jpeg_image(j, x,y,comp,req_comp);
4038    STBI_FREE(j);
4039    return result;
4040 }
4041
4042 static int stbi__jpeg_test(stbi__context *s)
4043 {
4044    int r;
4045    stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
4046    if (!j) return stbi__err("outofmem", "Out of memory");
4047    memset(j, 0, sizeof(stbi__jpeg));
4048    j->s = s;
4049    stbi__setup_jpeg(j);
4050    r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
4051    stbi__rewind(s);
4052    STBI_FREE(j);
4053    return r;
4054 }
4055
4056 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
4057 {
4058    if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
4059       stbi__rewind( j->s );
4060       return 0;
4061    }
4062    if (x) *x = j->s->img_x;
4063    if (y) *y = j->s->img_y;
4064    if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
4065    return 1;
4066 }
4067
4068 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
4069 {
4070    int result;
4071    stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
4072    if (!j) return stbi__err("outofmem", "Out of memory");
4073    memset(j, 0, sizeof(stbi__jpeg));
4074    j->s = s;
4075    result = stbi__jpeg_info_raw(j, x, y, comp);
4076    STBI_FREE(j);
4077    return result;
4078 }
4079 #endif
4080
4081 // public domain zlib decode    v0.2  Sean Barrett 2006-11-18
4082 //    simple implementation
4083 //      - all input must be provided in an upfront buffer
4084 //      - all output is written to a single output buffer (can malloc/realloc)
4085 //    performance
4086 //      - fast huffman
4087
4088 #ifndef STBI_NO_ZLIB
4089
4090 // fast-way is faster to check than jpeg huffman, but slow way is slower
4091 #define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
4092 #define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
4093 #define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
4094
4095 // zlib-style huffman encoding
4096 // (jpegs packs from left, zlib from right, so can't share code)
4097 typedef struct
4098 {
4099    stbi__uint16 fast[1 << STBI__ZFAST_BITS];
4100    stbi__uint16 firstcode[16];
4101    int maxcode[17];
4102    stbi__uint16 firstsymbol[16];
4103    stbi_uc  size[STBI__ZNSYMS];
4104    stbi__uint16 value[STBI__ZNSYMS];
4105 } stbi__zhuffman;
4106
4107 stbi_inline static int stbi__bitreverse16(int n)
4108 {
4109   n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
4110   n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
4111   n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
4112   n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
4113   return n;
4114 }
4115
4116 stbi_inline static int stbi__bit_reverse(int v, int bits)
4117 {
4118    STBI_ASSERT(bits <= 16);
4119    // to bit reverse n bits, reverse 16 and shift
4120    // e.g. 11 bits, bit reverse and shift away 5
4121    return stbi__bitreverse16(v) >> (16-bits);
4122 }
4123
4124 static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
4125 {
4126    int i,k=0;
4127    int code, next_code[16], sizes[17];
4128
4129    // DEFLATE spec for generating codes
4130    memset(sizes, 0, sizeof(sizes));
4131    memset(z->fast, 0, sizeof(z->fast));
4132    for (i=0; i < num; ++i)
4133       ++sizes[sizelist[i]];
4134    sizes[0] = 0;
4135    for (i=1; i < 16; ++i)
4136       if (sizes[i] > (1 << i))
4137          return stbi__err("bad sizes", "Corrupt PNG");
4138    code = 0;
4139    for (i=1; i < 16; ++i) {
4140       next_code[i] = code;
4141       z->firstcode[i] = (stbi__uint16) code;
4142       z->firstsymbol[i] = (stbi__uint16) k;
4143       code = (code + sizes[i]);
4144       if (sizes[i])
4145          if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
4146       z->maxcode[i] = code << (16-i); // preshift for inner loop
4147       code <<= 1;
4148       k += sizes[i];
4149    }
4150    z->maxcode[16] = 0x10000; // sentinel
4151    for (i=0; i < num; ++i) {
4152       int s = sizelist[i];
4153       if (s) {
4154          int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
4155          stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
4156          z->size [c] = (stbi_uc     ) s;
4157          z->value[c] = (stbi__uint16) i;
4158          if (s <= STBI__ZFAST_BITS) {
4159             int j = stbi__bit_reverse(next_code[s],s);
4160             while (j < (1 << STBI__ZFAST_BITS)) {
4161                z->fast[j] = fastv;
4162                j += (1 << s);
4163             }
4164          }
4165          ++next_code[s];
4166       }
4167    }
4168    return 1;
4169 }
4170
4171 // zlib-from-memory implementation for PNG reading
4172 //    because PNG allows splitting the zlib stream arbitrarily,
4173 //    and it's annoying structurally to have PNG call ZLIB call PNG,
4174 //    we require PNG read all the IDATs and combine them into a single
4175 //    memory buffer
4176
4177 typedef struct
4178 {
4179    stbi_uc *zbuffer, *zbuffer_end;
4180    int num_bits;
4181    int hit_zeof_once;
4182    stbi__uint32 code_buffer;
4183
4184    char *zout;
4185    char *zout_start;
4186    char *zout_end;
4187    int   z_expandable;
4188
4189    stbi__zhuffman z_length, z_distance;
4190 } stbi__zbuf;
4191
4192 stbi_inline static int stbi__zeof(stbi__zbuf *z)
4193 {
4194    return (z->zbuffer >= z->zbuffer_end);
4195 }
4196
4197 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
4198 {
4199    return stbi__zeof(z) ? 0 : *z->zbuffer++;
4200 }
4201
4202 static void stbi__fill_bits(stbi__zbuf *z)
4203 {
4204    do {
4205       if (z->code_buffer >= (1U << z->num_bits)) {
4206         z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
4207         return;
4208       }
4209       z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
4210       z->num_bits += 8;
4211    } while (z->num_bits <= 24);
4212 }
4213
4214 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
4215 {
4216    unsigned int k;
4217    if (z->num_bits < n) stbi__fill_bits(z);
4218    k = z->code_buffer & ((1 << n) - 1);
4219    z->code_buffer >>= n;
4220    z->num_bits -= n;
4221    return k;
4222 }
4223
4224 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
4225 {
4226    int b,s,k;
4227    // not resolved by fast table, so compute it the slow way
4228    // use jpeg approach, which requires MSbits at top
4229    k = stbi__bit_reverse(a->code_buffer, 16);
4230    for (s=STBI__ZFAST_BITS+1; ; ++s)
4231       if (k < z->maxcode[s])
4232          break;
4233    if (s >= 16) return -1; // invalid code!
4234    // code size is s, so:
4235    b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
4236    if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
4237    if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
4238    a->code_buffer >>= s;
4239    a->num_bits -= s;
4240    return z->value[b];
4241 }
4242
4243 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
4244 {
4245    int b,s;
4246    if (a->num_bits < 16) {
4247       if (stbi__zeof(a)) {
4248          if (!a->hit_zeof_once) {
4249             // This is the first time we hit eof, insert 16 extra padding btis
4250             // to allow us to keep going; if we actually consume any of them
4251             // though, that is invalid data. This is caught later.
4252             a->hit_zeof_once = 1;
4253             a->num_bits += 16; // add 16 implicit zero bits
4254          } else {
4255             // We already inserted our extra 16 padding bits and are again
4256             // out, this stream is actually prematurely terminated.
4257             return -1;
4258          }
4259       } else {
4260          stbi__fill_bits(a);
4261       }
4262    }
4263    b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
4264    if (b) {
4265       s = b >> 9;
4266       a->code_buffer >>= s;
4267       a->num_bits -= s;
4268       return b & 511;
4269    }
4270    return stbi__zhuffman_decode_slowpath(a, z);
4271 }
4272
4273 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
4274 {
4275    char *q;
4276    unsigned int cur, limit, old_limit;
4277    z->zout = zout;
4278    if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
4279    cur   = (unsigned int) (z->zout - z->zout_start);
4280    limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
4281    if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
4282    while (cur + n > limit) {
4283       if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
4284       limit *= 2;
4285    }
4286    q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
4287    STBI_NOTUSED(old_limit);
4288    if (q == NULL) return stbi__err("outofmem", "Out of memory");
4289    z->zout_start = q;
4290    z->zout       = q + cur;
4291    z->zout_end   = q + limit;
4292    return 1;
4293 }
4294
4295 static const int stbi__zlength_base[31] = {
4296    3,4,5,6,7,8,9,10,11,13,
4297    15,17,19,23,27,31,35,43,51,59,
4298    67,83,99,115,131,163,195,227,258,0,0 };
4299
4300 static const int stbi__zlength_extra[31]=
4301 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
4302
4303 static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
4304 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
4305
4306 static const int stbi__zdist_extra[32] =
4307 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
4308
4309 static int stbi__parse_huffman_block(stbi__zbuf *a)
4310 {
4311    char *zout = a->zout;
4312    for(;;) {
4313       int z = stbi__zhuffman_decode(a, &a->z_length);
4314       if (z < 256) {
4315          if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
4316          if (zout >= a->zout_end) {
4317             if (!stbi__zexpand(a, zout, 1)) return 0;
4318             zout = a->zout;
4319          }
4320          *zout++ = (char) z;
4321       } else {
4322          stbi_uc *p;
4323          int len,dist;
4324          if (z == 256) {
4325             a->zout = zout;
4326             if (a->hit_zeof_once && a->num_bits < 16) {
4327                // The first time we hit zeof, we inserted 16 extra zero bits into our bit
4328                // buffer so the decoder can just do its speculative decoding. But if we
4329                // actually consumed any of those bits (which is the case when num_bits < 16),
4330                // the stream actually read past the end so it is malformed.
4331                return stbi__err("unexpected end","Corrupt PNG");
4332             }
4333             return 1;
4334          }
4335          if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data
4336          z -= 257;
4337          len = stbi__zlength_base[z];
4338          if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
4339          z = stbi__zhuffman_decode(a, &a->z_distance);
4340          if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data
4341          dist = stbi__zdist_base[z];
4342          if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
4343          if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
4344          if (len > a->zout_end - zout) {
4345             if (!stbi__zexpand(a, zout, len)) return 0;
4346             zout = a->zout;
4347          }
4348          p = (stbi_uc *) (zout - dist);
4349          if (dist == 1) { // run of one byte; common in images.
4350             stbi_uc v = *p;
4351             if (len) { do *zout++ = v; while (--len); }
4352          } else {
4353             if (len) { do *zout++ = *p++; while (--len); }
4354          }
4355       }
4356    }
4357 }
4358
4359 static int stbi__compute_huffman_codes(stbi__zbuf *a)
4360 {
4361    static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
4362    stbi__zhuffman z_codelength;
4363    stbi_uc lencodes[286+32+137];//padding for maximum single op
4364    stbi_uc codelength_sizes[19];
4365    int i,n;
4366
4367    int hlit  = stbi__zreceive(a,5) + 257;
4368    int hdist = stbi__zreceive(a,5) + 1;
4369    int hclen = stbi__zreceive(a,4) + 4;
4370    int ntot  = hlit + hdist;
4371
4372    memset(codelength_sizes, 0, sizeof(codelength_sizes));
4373    for (i=0; i < hclen; ++i) {
4374       int s = stbi__zreceive(a,3);
4375       codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
4376    }
4377    if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
4378
4379    n = 0;
4380    while (n < ntot) {
4381       int c = stbi__zhuffman_decode(a, &z_codelength);
4382       if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4383       if (c < 16)
4384          lencodes[n++] = (stbi_uc) c;
4385       else {
4386          stbi_uc fill = 0;
4387          if (c == 16) {
4388             c = stbi__zreceive(a,2)+3;
4389             if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4390             fill = lencodes[n-1];
4391          } else if (c == 17) {
4392             c = stbi__zreceive(a,3)+3;
4393          } else if (c == 18) {
4394             c = stbi__zreceive(a,7)+11;
4395          } else {
4396             return stbi__err("bad codelengths", "Corrupt PNG");
4397          }
4398          if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4399          memset(lencodes+n, fill, c);
4400          n += c;
4401       }
4402    }
4403    if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
4404    if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4405    if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
4406    return 1;
4407 }
4408
4409 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4410 {
4411    stbi_uc header[4];
4412    int len,nlen,k;
4413    if (a->num_bits & 7)
4414       stbi__zreceive(a, a->num_bits & 7); // discard
4415    // drain the bit-packed data into header
4416    k = 0;
4417    while (a->num_bits > 0) {
4418       header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
4419       a->code_buffer >>= 8;
4420       a->num_bits -= 8;
4421    }
4422    if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
4423    // now fill header the normal way
4424    while (k < 4)
4425       header[k++] = stbi__zget8(a);
4426    len  = header[1] * 256 + header[0];
4427    nlen = header[3] * 256 + header[2];
4428    if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
4429    if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
4430    if (a->zout + len > a->zout_end)
4431       if (!stbi__zexpand(a, a->zout, len)) return 0;
4432    memcpy(a->zout, a->zbuffer, len);
4433    a->zbuffer += len;
4434    a->zout += len;
4435    return 1;
4436 }
4437
4438 static int stbi__parse_zlib_header(stbi__zbuf *a)
4439 {
4440    int cmf   = stbi__zget8(a);
4441    int cm    = cmf & 15;
4442    /* int cinfo = cmf >> 4; */
4443    int flg   = stbi__zget8(a);
4444    if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4445    if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4446    if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
4447    if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
4448    // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4449    return 1;
4450 }
4451
4452 static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
4453 {
4454    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4455    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4456    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4457    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
4458    8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4459    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4460    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4461    9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
4462    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
4463 };
4464 static const stbi_uc stbi__zdefault_distance[32] =
4465 {
4466    5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
4467 };
4468 /*
4469 Init algorithm:
4470 {
4471    int i;   // use <= to match clearly with spec
4472    for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
4473    for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
4474    for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
4475    for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
4476
4477    for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
4478 }
4479 */
4480
4481 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4482 {
4483    int final, type;
4484    if (parse_header)
4485       if (!stbi__parse_zlib_header(a)) return 0;
4486    a->num_bits = 0;
4487    a->code_buffer = 0;
4488    a->hit_zeof_once = 0;
4489    do {
4490       final = stbi__zreceive(a,1);
4491       type = stbi__zreceive(a,2);
4492       if (type == 0) {
4493          if (!stbi__parse_uncompressed_block(a)) return 0;
4494       } else if (type == 3) {
4495          return 0;
4496       } else {
4497          if (type == 1) {
4498             // use fixed code lengths
4499             if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
4500             if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
4501          } else {
4502             if (!stbi__compute_huffman_codes(a)) return 0;
4503          }
4504          if (!stbi__parse_huffman_block(a)) return 0;
4505       }
4506    } while (!final);
4507    return 1;
4508 }
4509
4510 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4511 {
4512    a->zout_start = obuf;
4513    a->zout       = obuf;
4514    a->zout_end   = obuf + olen;
4515    a->z_expandable = exp;
4516
4517    return stbi__parse_zlib(a, parse_header);
4518 }
4519
4520 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4521 {
4522    stbi__zbuf a;
4523    char *p = (char *) stbi__malloc(initial_size);
4524    if (p == NULL) return NULL;
4525    a.zbuffer = (stbi_uc *) buffer;
4526    a.zbuffer_end = (stbi_uc *) buffer + len;
4527    if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4528       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4529       return a.zout_start;
4530    } else {
4531       STBI_FREE(a.zout_start);
4532       return NULL;
4533    }
4534 }
4535
4536 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4537 {
4538    return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4539 }
4540
4541 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4542 {
4543    stbi__zbuf a;
4544    char *p = (char *) stbi__malloc(initial_size);
4545    if (p == NULL) return NULL;
4546    a.zbuffer = (stbi_uc *) buffer;
4547    a.zbuffer_end = (stbi_uc *) buffer + len;
4548    if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4549       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4550       return a.zout_start;
4551    } else {
4552       STBI_FREE(a.zout_start);
4553       return NULL;
4554    }
4555 }
4556
4557 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4558 {
4559    stbi__zbuf a;
4560    a.zbuffer = (stbi_uc *) ibuffer;
4561    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4562    if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4563       return (int) (a.zout - a.zout_start);
4564    else
4565       return -1;
4566 }
4567
4568 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4569 {
4570    stbi__zbuf a;
4571    char *p = (char *) stbi__malloc(16384);
4572    if (p == NULL) return NULL;
4573    a.zbuffer = (stbi_uc *) buffer;
4574    a.zbuffer_end = (stbi_uc *) buffer+len;
4575    if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4576       if (outlen) *outlen = (int) (a.zout - a.zout_start);
4577       return a.zout_start;
4578    } else {
4579       STBI_FREE(a.zout_start);
4580       return NULL;
4581    }
4582 }
4583
4584 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4585 {
4586    stbi__zbuf a;
4587    a.zbuffer = (stbi_uc *) ibuffer;
4588    a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4589    if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4590       return (int) (a.zout - a.zout_start);
4591    else
4592       return -1;
4593 }
4594 #endif
4595
4596 // public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
4597 //    simple implementation
4598 //      - only 8-bit samples
4599 //      - no CRC checking
4600 //      - allocates lots of intermediate memory
4601 //        - avoids problem of streaming data between subsystems
4602 //        - avoids explicit window management
4603 //    performance
4604 //      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4605
4606 #ifndef STBI_NO_PNG
4607 typedef struct
4608 {
4609    stbi__uint32 length;
4610    stbi__uint32 type;
4611 } stbi__pngchunk;
4612
4613 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4614 {
4615    stbi__pngchunk c;
4616    c.length = stbi__get32be(s);
4617    c.type   = stbi__get32be(s);
4618    return c;
4619 }
4620
4621 static int stbi__check_png_header(stbi__context *s)
4622 {
4623    static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4624    int i;
4625    for (i=0; i < 8; ++i)
4626       if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
4627    return 1;
4628 }
4629
4630 typedef struct
4631 {
4632    stbi__context *s;
4633    stbi_uc *idata, *expanded, *out;
4634    int depth;
4635 } stbi__png;
4636
4637
4638 enum {
4639    STBI__F_none=0,
4640    STBI__F_sub=1,
4641    STBI__F_up=2,
4642    STBI__F_avg=3,
4643    STBI__F_paeth=4,
4644    // synthetic filter used for first scanline to avoid needing a dummy row of 0s
4645    STBI__F_avg_first
4646 };
4647
4648 static stbi_uc first_row_filter[5] =
4649 {
4650    STBI__F_none,
4651    STBI__F_sub,
4652    STBI__F_none,
4653    STBI__F_avg_first,
4654    STBI__F_sub // Paeth with b=c=0 turns out to be equivalent to sub
4655 };
4656
4657 static int stbi__paeth(int a, int b, int c)
4658 {
4659    // This formulation looks very different from the reference in the PNG spec, but is
4660    // actually equivalent and has favorable data dependencies and admits straightforward
4661    // generation of branch-free code, which helps performance significantly.
4662    int thresh = c*3 - (a + b);
4663    int lo = a < b ? a : b;
4664    int hi = a < b ? b : a;
4665    int t0 = (hi <= thresh) ? lo : c;
4666    int t1 = (thresh <= lo) ? hi : t0;
4667    return t1;
4668 }
4669
4670 static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4671
4672 // adds an extra all-255 alpha channel
4673 // dest == src is legal
4674 // img_n must be 1 or 3
4675 static void stbi__create_png_alpha_expand8(stbi_uc *dest, stbi_uc *src, stbi__uint32 x, int img_n)
4676 {
4677    int i;
4678    // must process data backwards since we allow dest==src
4679    if (img_n == 1) {
4680       for (i=x-1; i >= 0; --i) {
4681          dest[i*2+1] = 255;
4682          dest[i*2+0] = src[i];
4683       }
4684    } else {
4685       STBI_ASSERT(img_n == 3);
4686       for (i=x-1; i >= 0; --i) {
4687          dest[i*4+3] = 255;
4688          dest[i*4+2] = src[i*3+2];
4689          dest[i*4+1] = src[i*3+1];
4690          dest[i*4+0] = src[i*3+0];
4691       }
4692    }
4693 }
4694
4695 // create the png data from post-deflated data
4696 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4697 {
4698    int bytes = (depth == 16 ? 2 : 1);
4699    stbi__context *s = a->s;
4700    stbi__uint32 i,j,stride = x*out_n*bytes;
4701    stbi__uint32 img_len, img_width_bytes;
4702    stbi_uc *filter_buf;
4703    int all_ok = 1;
4704    int k;
4705    int img_n = s->img_n; // copy it into a local for later
4706
4707    int output_bytes = out_n*bytes;
4708    int filter_bytes = img_n*bytes;
4709    int width = x;
4710
4711    STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4712    a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4713    if (!a->out) return stbi__err("outofmem", "Out of memory");
4714
4715    // note: error exits here don't need to clean up a->out individually,
4716    // stbi__do_png always does on error.
4717    if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
4718    img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4719    if (!stbi__mad2sizes_valid(img_width_bytes, y, img_width_bytes)) return stbi__err("too large", "Corrupt PNG");
4720    img_len = (img_width_bytes + 1) * y;
4721
4722    // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
4723    // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
4724    // so just check for raw_len < img_len always.
4725    if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4726
4727    // Allocate two scan lines worth of filter workspace buffer.
4728    filter_buf = (stbi_uc *) stbi__malloc_mad2(img_width_bytes, 2, 0);
4729    if (!filter_buf) return stbi__err("outofmem", "Out of memory");
4730
4731    // Filtering for low-bit-depth images
4732    if (depth < 8) {
4733       filter_bytes = 1;
4734       width = img_width_bytes;
4735    }
4736
4737    for (j=0; j < y; ++j) {
4738       // cur/prior filter buffers alternate
4739       stbi_uc *cur = filter_buf + (j & 1)*img_width_bytes;
4740       stbi_uc *prior = filter_buf + (~j & 1)*img_width_bytes;
4741       stbi_uc *dest = a->out + stride*j;
4742       int nk = width * filter_bytes;
4743       int filter = *raw++;
4744
4745       // check filter type
4746       if (filter > 4) {
4747          all_ok = stbi__err("invalid filter","Corrupt PNG");
4748          break;
4749       }
4750
4751       // if first row, use special filter that doesn't sample previous row
4752       if (j == 0) filter = first_row_filter[filter];
4753
4754       // perform actual filtering
4755       switch (filter) {
4756       case STBI__F_none:
4757          memcpy(cur, raw, nk);
4758          break;
4759       case STBI__F_sub:
4760          memcpy(cur, raw, filter_bytes);
4761          for (k = filter_bytes; k < nk; ++k)
4762             cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]);
4763          break;
4764       case STBI__F_up:
4765          for (k = 0; k < nk; ++k)
4766             cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
4767          break;
4768       case STBI__F_avg:
4769          for (k = 0; k < filter_bytes; ++k)
4770             cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1));
4771          for (k = filter_bytes; k < nk; ++k)
4772             cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1));
4773          break;
4774       case STBI__F_paeth:
4775          for (k = 0; k < filter_bytes; ++k)
4776             cur[k] = STBI__BYTECAST(raw[k] + prior[k]); // prior[k] == stbi__paeth(0,prior[k],0)
4777          for (k = filter_bytes; k < nk; ++k)
4778             cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes], prior[k], prior[k-filter_bytes]));
4779          break;
4780       case STBI__F_avg_first:
4781          memcpy(cur, raw, filter_bytes);
4782          for (k = filter_bytes; k < nk; ++k)
4783             cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1));
4784          break;
4785       }
4786
4787       raw += nk;
4788
4789       // expand decoded bits in cur to dest, also adding an extra alpha channel if desired
4790       if (depth < 8) {
4791          stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4792          stbi_uc *in = cur;
4793          stbi_uc *out = dest;
4794          stbi_uc inb = 0;
4795          stbi__uint32 nsmp = x*img_n;
4796
4797          // expand bits to bytes first
4798          if (depth == 4) {
4799             for (i=0; i < nsmp; ++i) {
4800                if ((i & 1) == 0) inb = *in++;
4801                *out++ = scale * (inb >> 4);
4802                inb <<= 4;
4803             }
4804          } else if (depth == 2) {
4805             for (i=0; i < nsmp; ++i) {
4806                if ((i & 3) == 0) inb = *in++;
4807                *out++ = scale * (inb >> 6);
4808                inb <<= 2;
4809             }
4810          } else {
4811             STBI_ASSERT(depth == 1);
4812             for (i=0; i < nsmp; ++i) {
4813                if ((i & 7) == 0) inb = *in++;
4814                *out++ = scale * (inb >> 7);
4815                inb <<= 1;
4816             }
4817          }
4818
4819          // insert alpha=255 values if desired
4820          if (img_n != out_n)
4821             stbi__create_png_alpha_expand8(dest, dest, x, img_n);
4822       } else if (depth == 8) {
4823          if (img_n == out_n)
4824             memcpy(dest, cur, x*img_n);
4825          else
4826             stbi__create_png_alpha_expand8(dest, cur, x, img_n);
4827       } else if (depth == 16) {
4828          // convert the image data from big-endian to platform-native
4829          stbi__uint16 *dest16 = (stbi__uint16*)dest;
4830          stbi__uint32 nsmp = x*img_n;
4831
4832          if (img_n == out_n) {
4833             for (i = 0; i < nsmp; ++i, ++dest16, cur += 2)
4834                *dest16 = (cur[0] << 8) | cur[1];
4835          } else {
4836             STBI_ASSERT(img_n+1 == out_n);
4837             if (img_n == 1) {
4838                for (i = 0; i < x; ++i, dest16 += 2, cur += 2) {
4839                   dest16[0] = (cur[0] << 8) | cur[1];
4840                   dest16[1] = 0xffff;
4841                }
4842             } else {
4843                STBI_ASSERT(img_n == 3);
4844                for (i = 0; i < x; ++i, dest16 += 4, cur += 6) {
4845                   dest16[0] = (cur[0] << 8) | cur[1];
4846                   dest16[1] = (cur[2] << 8) | cur[3];
4847                   dest16[2] = (cur[4] << 8) | cur[5];
4848                   dest16[3] = 0xffff;
4849                }
4850             }
4851          }
4852       }
4853    }
4854
4855    STBI_FREE(filter_buf);
4856    if (!all_ok) return 0;
4857
4858    return 1;
4859 }
4860
4861 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4862 {
4863    int bytes = (depth == 16 ? 2 : 1);
4864    int out_bytes = out_n * bytes;
4865    stbi_uc *final;
4866    int p;
4867    if (!interlaced)
4868       return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4869
4870    // de-interlacing
4871    final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4872    if (!final) return stbi__err("outofmem", "Out of memory");
4873    for (p=0; p < 7; ++p) {
4874       int xorig[] = { 0,4,0,2,0,1,0 };
4875       int yorig[] = { 0,0,4,0,2,0,1 };
4876       int xspc[]  = { 8,8,4,4,2,2,1 };
4877       int yspc[]  = { 8,8,8,4,4,2,2 };
4878       int i,j,x,y;
4879       // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4880       x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4881       y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4882       if (x && y) {
4883          stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4884          if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4885             STBI_FREE(final);
4886             return 0;
4887          }
4888          for (j=0; j < y; ++j) {
4889             for (i=0; i < x; ++i) {
4890                int out_y = j*yspc[p]+yorig[p];
4891                int out_x = i*xspc[p]+xorig[p];
4892                memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4893                       a->out + (j*x+i)*out_bytes, out_bytes);
4894             }
4895          }
4896          STBI_FREE(a->out);
4897          image_data += img_len;
4898          image_data_len -= img_len;
4899       }
4900    }
4901    a->out = final;
4902
4903    return 1;
4904 }
4905
4906 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4907 {
4908    stbi__context *s = z->s;
4909    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4910    stbi_uc *p = z->out;
4911
4912    // compute color-based transparency, assuming we've
4913    // already got 255 as the alpha value in the output
4914    STBI_ASSERT(out_n == 2 || out_n == 4);
4915
4916    if (out_n == 2) {
4917       for (i=0; i < pixel_count; ++i) {
4918          p[1] = (p[0] == tc[0] ? 0 : 255);
4919          p += 2;
4920       }
4921    } else {
4922       for (i=0; i < pixel_count; ++i) {
4923          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4924             p[3] = 0;
4925          p += 4;
4926       }
4927    }
4928    return 1;
4929 }
4930
4931 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4932 {
4933    stbi__context *s = z->s;
4934    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4935    stbi__uint16 *p = (stbi__uint16*) z->out;
4936
4937    // compute color-based transparency, assuming we've
4938    // already got 65535 as the alpha value in the output
4939    STBI_ASSERT(out_n == 2 || out_n == 4);
4940
4941    if (out_n == 2) {
4942       for (i = 0; i < pixel_count; ++i) {
4943          p[1] = (p[0] == tc[0] ? 0 : 65535);
4944          p += 2;
4945       }
4946    } else {
4947       for (i = 0; i < pixel_count; ++i) {
4948          if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4949             p[3] = 0;
4950          p += 4;
4951       }
4952    }
4953    return 1;
4954 }
4955
4956 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4957 {
4958    stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4959    stbi_uc *p, *temp_out, *orig = a->out;
4960
4961    p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4962    if (p == NULL) return stbi__err("outofmem", "Out of memory");
4963
4964    // between here and free(out) below, exitting would leak
4965    temp_out = p;
4966
4967    if (pal_img_n == 3) {
4968       for (i=0; i < pixel_count; ++i) {
4969          int n = orig[i]*4;
4970          p[0] = palette[n  ];
4971          p[1] = palette[n+1];
4972          p[2] = palette[n+2];
4973          p += 3;
4974       }
4975    } else {
4976       for (i=0; i < pixel_count; ++i) {
4977          int n = orig[i]*4;
4978          p[0] = palette[n  ];
4979          p[1] = palette[n+1];
4980          p[2] = palette[n+2];
4981          p[3] = palette[n+3];
4982          p += 4;
4983       }
4984    }
4985    STBI_FREE(a->out);
4986    a->out = temp_out;
4987
4988    STBI_NOTUSED(len);
4989
4990    return 1;
4991 }
4992
4993 static int stbi__unpremultiply_on_load_global = 0;
4994 static int stbi__de_iphone_flag_global = 0;
4995
4996 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4997 {
4998    stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
4999 }
5000
5001 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
5002 {
5003    stbi__de_iphone_flag_global = flag_true_if_should_convert;
5004 }
5005
5006 #ifndef STBI_THREAD_LOCAL
5007 #define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
5008 #define stbi__de_iphone_flag  stbi__de_iphone_flag_global
5009 #else
5010 static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
5011 static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
5012
5013 STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
5014 {
5015    stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
5016    stbi__unpremultiply_on_load_set = 1;
5017 }
5018
5019 STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
5020 {
5021    stbi__de_iphone_flag_local = flag_true_if_should_convert;
5022    stbi__de_iphone_flag_set = 1;
5023 }
5024
5025 #define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
5026                                        ? stbi__unpremultiply_on_load_local      \
5027                                        : stbi__unpremultiply_on_load_global)
5028 #define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
5029                                 ? stbi__de_iphone_flag_local                    \
5030                                 : stbi__de_iphone_flag_global)
5031 #endif // STBI_THREAD_LOCAL
5032
5033 static void stbi__de_iphone(stbi__png *z)
5034 {
5035    stbi__context *s = z->s;
5036    stbi__uint32 i, pixel_count = s->img_x * s->img_y;
5037    stbi_uc *p = z->out;
5038
5039    if (s->img_out_n == 3) {  // convert bgr to rgb
5040       for (i=0; i < pixel_count; ++i) {
5041          stbi_uc t = p[0];
5042          p[0] = p[2];
5043          p[2] = t;
5044          p += 3;
5045       }
5046    } else {
5047       STBI_ASSERT(s->img_out_n == 4);
5048       if (stbi__unpremultiply_on_load) {
5049          // convert bgr to rgb and unpremultiply
5050          for (i=0; i < pixel_count; ++i) {
5051             stbi_uc a = p[3];
5052             stbi_uc t = p[0];
5053             if (a) {
5054                stbi_uc half = a / 2;
5055                p[0] = (p[2] * 255 + half) / a;
5056                p[1] = (p[1] * 255 + half) / a;
5057                p[2] = ( t   * 255 + half) / a;
5058             } else {
5059                p[0] = p[2];
5060                p[2] = t;
5061             }
5062             p += 4;
5063          }
5064       } else {
5065          // convert bgr to rgb
5066          for (i=0; i < pixel_count; ++i) {
5067             stbi_uc t = p[0];
5068             p[0] = p[2];
5069             p[2] = t;
5070             p += 4;
5071          }
5072       }
5073    }
5074 }
5075
5076 #define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
5077
5078 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
5079 {
5080    stbi_uc palette[1024], pal_img_n=0;
5081    stbi_uc has_trans=0, tc[3]={0};
5082    stbi__uint16 tc16[3];
5083    stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
5084    int first=1,k,interlace=0, color=0, is_iphone=0;
5085    stbi__context *s = z->s;
5086
5087    z->expanded = NULL;
5088    z->idata = NULL;
5089    z->out = NULL;
5090
5091    if (!stbi__check_png_header(s)) return 0;
5092
5093    if (scan == STBI__SCAN_type) return 1;
5094
5095    for (;;) {
5096       stbi__pngchunk c = stbi__get_chunk_header(s);
5097       switch (c.type) {
5098          case STBI__PNG_TYPE('C','g','B','I'):
5099             is_iphone = 1;
5100             stbi__skip(s, c.length);
5101             break;
5102          case STBI__PNG_TYPE('I','H','D','R'): {
5103             int comp,filter;
5104             if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
5105             first = 0;
5106             if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
5107             s->img_x = stbi__get32be(s);
5108             s->img_y = stbi__get32be(s);
5109             if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5110             if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
5111             z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
5112             color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
5113             if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
5114             if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
5115             comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
5116             filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
5117             interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
5118             if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
5119             if (!pal_img_n) {
5120                s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
5121                if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
5122             } else {
5123                // if paletted, then pal_n is our final components, and
5124                // img_n is # components to decompress/filter.
5125                s->img_n = 1;
5126                if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
5127             }
5128             // even with SCAN_header, have to scan to see if we have a tRNS
5129             break;
5130          }
5131
5132          case STBI__PNG_TYPE('P','L','T','E'):  {
5133             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5134             if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
5135             pal_len = c.length / 3;
5136             if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
5137             for (i=0; i < pal_len; ++i) {
5138                palette[i*4+0] = stbi__get8(s);
5139                palette[i*4+1] = stbi__get8(s);
5140                palette[i*4+2] = stbi__get8(s);
5141                palette[i*4+3] = 255;
5142             }
5143             break;
5144          }
5145
5146          case STBI__PNG_TYPE('t','R','N','S'): {
5147             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5148             if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
5149             if (pal_img_n) {
5150                if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
5151                if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
5152                if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
5153                pal_img_n = 4;
5154                for (i=0; i < c.length; ++i)
5155                   palette[i*4+3] = stbi__get8(s);
5156             } else {
5157                if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
5158                if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
5159                has_trans = 1;
5160                // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now.
5161                if (scan == STBI__SCAN_header) { ++s->img_n; return 1; }
5162                if (z->depth == 16) {
5163                   for (k = 0; k < s->img_n && k < 3; ++k) // extra loop test to suppress false GCC warning
5164                      tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
5165                } else {
5166                   for (k = 0; k < s->img_n && k < 3; ++k)
5167                      tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
5168                }
5169             }
5170             break;
5171          }
5172
5173          case STBI__PNG_TYPE('I','D','A','T'): {
5174             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5175             if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
5176             if (scan == STBI__SCAN_header) {
5177                // header scan definitely stops at first IDAT
5178                if (pal_img_n)
5179                   s->img_n = pal_img_n;
5180                return 1;
5181             }
5182             if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes");
5183             if ((int)(ioff + c.length) < (int)ioff) return 0;
5184             if (ioff + c.length > idata_limit) {
5185                stbi__uint32 idata_limit_old = idata_limit;
5186                stbi_uc *p;
5187                if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
5188                while (ioff + c.length > idata_limit)
5189                   idata_limit *= 2;
5190                STBI_NOTUSED(idata_limit_old);
5191                p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
5192                z->idata = p;
5193             }
5194             if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
5195             ioff += c.length;
5196             break;
5197          }
5198
5199          case STBI__PNG_TYPE('I','E','N','D'): {
5200             stbi__uint32 raw_len, bpl;
5201             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5202             if (scan != STBI__SCAN_load) return 1;
5203             if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
5204             // initial guess for decoded data size to avoid unnecessary reallocs
5205             bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
5206             raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
5207             z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
5208             if (z->expanded == NULL) return 0; // zlib should set error
5209             STBI_FREE(z->idata); z->idata = NULL;
5210             if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
5211                s->img_out_n = s->img_n+1;
5212             else
5213                s->img_out_n = s->img_n;
5214             if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
5215             if (has_trans) {
5216                if (z->depth == 16) {
5217                   if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
5218                } else {
5219                   if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
5220                }
5221             }
5222             if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
5223                stbi__de_iphone(z);
5224             if (pal_img_n) {
5225                // pal_img_n == 3 or 4
5226                s->img_n = pal_img_n; // record the actual colors we had
5227                s->img_out_n = pal_img_n;
5228                if (req_comp >= 3) s->img_out_n = req_comp;
5229                if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
5230                   return 0;
5231             } else if (has_trans) {
5232                // non-paletted image with tRNS -> source image has (constant) alpha
5233                ++s->img_n;
5234             }
5235             STBI_FREE(z->expanded); z->expanded = NULL;
5236             // end of PNG chunk, read and skip CRC
5237             stbi__get32be(s);
5238             return 1;
5239          }
5240
5241          default:
5242             // if critical, fail
5243             if (first) return stbi__err("first not IHDR", "Corrupt PNG");
5244             if ((c.type & (1 << 29)) == 0) {
5245                #ifndef STBI_NO_FAILURE_STRINGS
5246                // not threadsafe
5247                static char invalid_chunk[] = "XXXX PNG chunk not known";
5248                invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
5249                invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
5250                invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
5251                invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
5252                #endif
5253                return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
5254             }
5255             stbi__skip(s, c.length);
5256             break;
5257       }
5258       // end of PNG chunk, read and skip CRC
5259       stbi__get32be(s);
5260    }
5261 }
5262
5263 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
5264 {
5265    void *result=NULL;
5266    if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
5267    if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
5268       if (p->depth <= 8)
5269          ri->bits_per_channel = 8;
5270       else if (p->depth == 16)
5271          ri->bits_per_channel = 16;
5272       else
5273          return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
5274       result = p->out;
5275       p->out = NULL;
5276       if (req_comp && req_comp != p->s->img_out_n) {
5277          if (ri->bits_per_channel == 8)
5278             result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5279          else
5280             result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
5281          p->s->img_out_n = req_comp;
5282          if (result == NULL) return result;
5283       }
5284       *x = p->s->img_x;
5285       *y = p->s->img_y;
5286       if (n) *n = p->s->img_n;
5287    }
5288    STBI_FREE(p->out);      p->out      = NULL;
5289    STBI_FREE(p->expanded); p->expanded = NULL;
5290    STBI_FREE(p->idata);    p->idata    = NULL;
5291
5292    return result;
5293 }
5294
5295 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5296 {
5297    stbi__png p;
5298    p.s = s;
5299    return stbi__do_png(&p, x,y,comp,req_comp, ri);
5300 }
5301
5302 static int stbi__png_test(stbi__context *s)
5303 {
5304    int r;
5305    r = stbi__check_png_header(s);
5306    stbi__rewind(s);
5307    return r;
5308 }
5309
5310 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
5311 {
5312    if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
5313       stbi__rewind( p->s );
5314       return 0;
5315    }
5316    if (x) *x = p->s->img_x;
5317    if (y) *y = p->s->img_y;
5318    if (comp) *comp = p->s->img_n;
5319    return 1;
5320 }
5321
5322 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
5323 {
5324    stbi__png p;
5325    p.s = s;
5326    return stbi__png_info_raw(&p, x, y, comp);
5327 }
5328
5329 static int stbi__png_is16(stbi__context *s)
5330 {
5331    stbi__png p;
5332    p.s = s;
5333    if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
5334            return 0;
5335    if (p.depth != 16) {
5336       stbi__rewind(p.s);
5337       return 0;
5338    }
5339    return 1;
5340 }
5341 #endif
5342
5343 // Microsoft/Windows BMP image
5344
5345 #ifndef STBI_NO_BMP
5346 static int stbi__bmp_test_raw(stbi__context *s)
5347 {
5348    int r;
5349    int sz;
5350    if (stbi__get8(s) != 'B') return 0;
5351    if (stbi__get8(s) != 'M') return 0;
5352    stbi__get32le(s); // discard filesize
5353    stbi__get16le(s); // discard reserved
5354    stbi__get16le(s); // discard reserved
5355    stbi__get32le(s); // discard data offset
5356    sz = stbi__get32le(s);
5357    r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
5358    return r;
5359 }
5360
5361 static int stbi__bmp_test(stbi__context *s)
5362 {
5363    int r = stbi__bmp_test_raw(s);
5364    stbi__rewind(s);
5365    return r;
5366 }
5367
5368
5369 // returns 0..31 for the highest set bit
5370 static int stbi__high_bit(unsigned int z)
5371 {
5372    int n=0;
5373    if (z == 0) return -1;
5374    if (z >= 0x10000) { n += 16; z >>= 16; }
5375    if (z >= 0x00100) { n +=  8; z >>=  8; }
5376    if (z >= 0x00010) { n +=  4; z >>=  4; }
5377    if (z >= 0x00004) { n +=  2; z >>=  2; }
5378    if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
5379    return n;
5380 }
5381
5382 static int stbi__bitcount(unsigned int a)
5383 {
5384    a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
5385    a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
5386    a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
5387    a = (a + (a >> 8)); // max 16 per 8 bits
5388    a = (a + (a >> 16)); // max 32 per 8 bits
5389    return a & 0xff;
5390 }
5391
5392 // extract an arbitrarily-aligned N-bit value (N=bits)
5393 // from v, and then make it 8-bits long and fractionally
5394 // extend it to full full range.
5395 static int stbi__shiftsigned(unsigned int v, int shift, int bits)
5396 {
5397    static unsigned int mul_table[9] = {
5398       0,
5399       0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
5400       0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
5401    };
5402    static unsigned int shift_table[9] = {
5403       0, 0,0,1,0,2,4,6,0,
5404    };
5405    if (shift < 0)
5406       v <<= -shift;
5407    else
5408       v >>= shift;
5409    STBI_ASSERT(v < 256);
5410    v >>= (8-bits);
5411    STBI_ASSERT(bits >= 0 && bits <= 8);
5412    return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
5413 }
5414
5415 typedef struct
5416 {
5417    int bpp, offset, hsz;
5418    unsigned int mr,mg,mb,ma, all_a;
5419    int extra_read;
5420 } stbi__bmp_data;
5421
5422 static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
5423 {
5424    // BI_BITFIELDS specifies masks explicitly, don't override
5425    if (compress == 3)
5426       return 1;
5427
5428    if (compress == 0) {
5429       if (info->bpp == 16) {
5430          info->mr = 31u << 10;
5431          info->mg = 31u <<  5;
5432          info->mb = 31u <<  0;
5433       } else if (info->bpp == 32) {
5434          info->mr = 0xffu << 16;
5435          info->mg = 0xffu <<  8;
5436          info->mb = 0xffu <<  0;
5437          info->ma = 0xffu << 24;
5438          info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5439       } else {
5440          // otherwise, use defaults, which is all-0
5441          info->mr = info->mg = info->mb = info->ma = 0;
5442       }
5443       return 1;
5444    }
5445    return 0; // error
5446 }
5447
5448 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
5449 {
5450    int hsz;
5451    if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
5452    stbi__get32le(s); // discard filesize
5453    stbi__get16le(s); // discard reserved
5454    stbi__get16le(s); // discard reserved
5455    info->offset = stbi__get32le(s);
5456    info->hsz = hsz = stbi__get32le(s);
5457    info->mr = info->mg = info->mb = info->ma = 0;
5458    info->extra_read = 14;
5459
5460    if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
5461
5462    if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
5463    if (hsz == 12) {
5464       s->img_x = stbi__get16le(s);
5465       s->img_y = stbi__get16le(s);
5466    } else {
5467       s->img_x = stbi__get32le(s);
5468       s->img_y = stbi__get32le(s);
5469    }
5470    if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
5471    info->bpp = stbi__get16le(s);
5472    if (hsz != 12) {
5473       int compress = stbi__get32le(s);
5474       if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5475       if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
5476       if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
5477       stbi__get32le(s); // discard sizeof
5478       stbi__get32le(s); // discard hres
5479       stbi__get32le(s); // discard vres
5480       stbi__get32le(s); // discard colorsused
5481       stbi__get32le(s); // discard max important
5482       if (hsz == 40 || hsz == 56) {
5483          if (hsz == 56) {
5484             stbi__get32le(s);
5485             stbi__get32le(s);
5486             stbi__get32le(s);
5487             stbi__get32le(s);
5488          }
5489          if (info->bpp == 16 || info->bpp == 32) {
5490             if (compress == 0) {
5491                stbi__bmp_set_mask_defaults(info, compress);
5492             } else if (compress == 3) {
5493                info->mr = stbi__get32le(s);
5494                info->mg = stbi__get32le(s);
5495                info->mb = stbi__get32le(s);
5496                info->extra_read += 12;
5497                // not documented, but generated by photoshop and handled by mspaint
5498                if (info->mr == info->mg && info->mg == info->mb) {
5499                   // ?!?!?
5500                   return stbi__errpuc("bad BMP", "bad BMP");
5501                }
5502             } else
5503                return stbi__errpuc("bad BMP", "bad BMP");
5504          }
5505       } else {
5506          // V4/V5 header
5507          int i;
5508          if (hsz != 108 && hsz != 124)
5509             return stbi__errpuc("bad BMP", "bad BMP");
5510          info->mr = stbi__get32le(s);
5511          info->mg = stbi__get32le(s);
5512          info->mb = stbi__get32le(s);
5513          info->ma = stbi__get32le(s);
5514          if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
5515             stbi__bmp_set_mask_defaults(info, compress);
5516          stbi__get32le(s); // discard color space
5517          for (i=0; i < 12; ++i)
5518             stbi__get32le(s); // discard color space parameters
5519          if (hsz == 124) {
5520             stbi__get32le(s); // discard rendering intent
5521             stbi__get32le(s); // discard offset of profile data
5522             stbi__get32le(s); // discard size of profile data
5523             stbi__get32le(s); // discard reserved
5524          }
5525       }
5526    }
5527    return (void *) 1;
5528 }
5529
5530
5531 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5532 {
5533    stbi_uc *out;
5534    unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
5535    stbi_uc pal[256][4];
5536    int psize=0,i,j,width;
5537    int flip_vertically, pad, target;
5538    stbi__bmp_data info;
5539    STBI_NOTUSED(ri);
5540
5541    info.all_a = 255;
5542    if (stbi__bmp_parse_header(s, &info) == NULL)
5543       return NULL; // error code already set
5544
5545    flip_vertically = ((int) s->img_y) > 0;
5546    s->img_y = abs((int) s->img_y);
5547
5548    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5549    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5550
5551    mr = info.mr;
5552    mg = info.mg;
5553    mb = info.mb;
5554    ma = info.ma;
5555    all_a = info.all_a;
5556
5557    if (info.hsz == 12) {
5558       if (info.bpp < 24)
5559          psize = (info.offset - info.extra_read - 24) / 3;
5560    } else {
5561       if (info.bpp < 16)
5562          psize = (info.offset - info.extra_read - info.hsz) >> 2;
5563    }
5564    if (psize == 0) {
5565       // accept some number of extra bytes after the header, but if the offset points either to before
5566       // the header ends or implies a large amount of extra data, reject the file as malformed
5567       int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original);
5568       int header_limit = 1024; // max we actually read is below 256 bytes currently.
5569       int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size.
5570       if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
5571          return stbi__errpuc("bad header", "Corrupt BMP");
5572       }
5573       // we established that bytes_read_so_far is positive and sensible.
5574       // the first half of this test rejects offsets that are either too small positives, or
5575       // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn
5576       // ensures the number computed in the second half of the test can't overflow.
5577       if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) {
5578          return stbi__errpuc("bad offset", "Corrupt BMP");
5579       } else {
5580          stbi__skip(s, info.offset - bytes_read_so_far);
5581       }
5582    }
5583
5584    if (info.bpp == 24 && ma == 0xff000000)
5585       s->img_n = 3;
5586    else
5587       s->img_n = ma ? 4 : 3;
5588    if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5589       target = req_comp;
5590    else
5591       target = s->img_n; // if they want monochrome, we'll post-convert
5592
5593    // sanity-check size
5594    if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5595       return stbi__errpuc("too large", "Corrupt BMP");
5596
5597    out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5598    if (!out) return stbi__errpuc("outofmem", "Out of memory");
5599    if (info.bpp < 16) {
5600       int z=0;
5601       if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5602       for (i=0; i < psize; ++i) {
5603          pal[i][2] = stbi__get8(s);
5604          pal[i][1] = stbi__get8(s);
5605          pal[i][0] = stbi__get8(s);
5606          if (info.hsz != 12) stbi__get8(s);
5607          pal[i][3] = 255;
5608       }
5609       stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5610       if (info.bpp == 1) width = (s->img_x + 7) >> 3;
5611       else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5612       else if (info.bpp == 8) width = s->img_x;
5613       else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5614       pad = (-width)&3;
5615       if (info.bpp == 1) {
5616          for (j=0; j < (int) s->img_y; ++j) {
5617             int bit_offset = 7, v = stbi__get8(s);
5618             for (i=0; i < (int) s->img_x; ++i) {
5619                int color = (v>>bit_offset)&0x1;
5620                out[z++] = pal[color][0];
5621                out[z++] = pal[color][1];
5622                out[z++] = pal[color][2];
5623                if (target == 4) out[z++] = 255;
5624                if (i+1 == (int) s->img_x) break;
5625                if((--bit_offset) < 0) {
5626                   bit_offset = 7;
5627                   v = stbi__get8(s);
5628                }
5629             }
5630             stbi__skip(s, pad);
5631          }
5632       } else {
5633          for (j=0; j < (int) s->img_y; ++j) {
5634             for (i=0; i < (int) s->img_x; i += 2) {
5635                int v=stbi__get8(s),v2=0;
5636                if (info.bpp == 4) {
5637                   v2 = v & 15;
5638                   v >>= 4;
5639                }
5640                out[z++] = pal[v][0];
5641                out[z++] = pal[v][1];
5642                out[z++] = pal[v][2];
5643                if (target == 4) out[z++] = 255;
5644                if (i+1 == (int) s->img_x) break;
5645                v = (info.bpp == 8) ? stbi__get8(s) : v2;
5646                out[z++] = pal[v][0];
5647                out[z++] = pal[v][1];
5648                out[z++] = pal[v][2];
5649                if (target == 4) out[z++] = 255;
5650             }
5651             stbi__skip(s, pad);
5652          }
5653       }
5654    } else {
5655       int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
5656       int z = 0;
5657       int easy=0;
5658       stbi__skip(s, info.offset - info.extra_read - info.hsz);
5659       if (info.bpp == 24) width = 3 * s->img_x;
5660       else if (info.bpp == 16) width = 2*s->img_x;
5661       else /* bpp = 32 and pad = 0 */ width=0;
5662       pad = (-width) & 3;
5663       if (info.bpp == 24) {
5664          easy = 1;
5665       } else if (info.bpp == 32) {
5666          if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5667             easy = 2;
5668       }
5669       if (!easy) {
5670          if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5671          // right shift amt to put high bit in position #7
5672          rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
5673          gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
5674          bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
5675          ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
5676          if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5677       }
5678       for (j=0; j < (int) s->img_y; ++j) {
5679          if (easy) {
5680             for (i=0; i < (int) s->img_x; ++i) {
5681                unsigned char a;
5682                out[z+2] = stbi__get8(s);
5683                out[z+1] = stbi__get8(s);
5684                out[z+0] = stbi__get8(s);
5685                z += 3;
5686                a = (easy == 2 ? stbi__get8(s) : 255);
5687                all_a |= a;
5688                if (target == 4) out[z++] = a;
5689             }
5690          } else {
5691             int bpp = info.bpp;
5692             for (i=0; i < (int) s->img_x; ++i) {
5693                stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
5694                unsigned int a;
5695                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5696                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5697                out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5698                a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5699                all_a |= a;
5700                if (target == 4) out[z++] = STBI__BYTECAST(a);
5701             }
5702          }
5703          stbi__skip(s, pad);
5704       }
5705    }
5706
5707    // if alpha channel is all 0s, replace with all 255s
5708    if (target == 4 && all_a == 0)
5709       for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
5710          out[i] = 255;
5711
5712    if (flip_vertically) {
5713       stbi_uc t;
5714       for (j=0; j < (int) s->img_y>>1; ++j) {
5715          stbi_uc *p1 = out +      j     *s->img_x*target;
5716          stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
5717          for (i=0; i < (int) s->img_x*target; ++i) {
5718             t = p1[i]; p1[i] = p2[i]; p2[i] = t;
5719          }
5720       }
5721    }
5722
5723    if (req_comp && req_comp != target) {
5724       out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5725       if (out == NULL) return out; // stbi__convert_format frees input on failure
5726    }
5727
5728    *x = s->img_x;
5729    *y = s->img_y;
5730    if (comp) *comp = s->img_n;
5731    return out;
5732 }
5733 #endif
5734
5735 // Targa Truevision - TGA
5736 // by Jonathan Dummer
5737 #ifndef STBI_NO_TGA
5738 // returns STBI_rgb or whatever, 0 on error
5739 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5740 {
5741    // only RGB or RGBA (incl. 16bit) or grey allowed
5742    if (is_rgb16) *is_rgb16 = 0;
5743    switch(bits_per_pixel) {
5744       case 8:  return STBI_grey;
5745       case 16: if(is_grey) return STBI_grey_alpha;
5746                // fallthrough
5747       case 15: if(is_rgb16) *is_rgb16 = 1;
5748                return STBI_rgb;
5749       case 24: // fallthrough
5750       case 32: return bits_per_pixel/8;
5751       default: return 0;
5752    }
5753 }
5754
5755 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5756 {
5757     int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5758     int sz, tga_colormap_type;
5759     stbi__get8(s);                   // discard Offset
5760     tga_colormap_type = stbi__get8(s); // colormap type
5761     if( tga_colormap_type > 1 ) {
5762         stbi__rewind(s);
5763         return 0;      // only RGB or indexed allowed
5764     }
5765     tga_image_type = stbi__get8(s); // image type
5766     if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
5767         if (tga_image_type != 1 && tga_image_type != 9) {
5768             stbi__rewind(s);
5769             return 0;
5770         }
5771         stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5772         sz = stbi__get8(s);    //   check bits per palette color entry
5773         if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5774             stbi__rewind(s);
5775             return 0;
5776         }
5777         stbi__skip(s,4);       // skip image x and y origin
5778         tga_colormap_bpp = sz;
5779     } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5780         if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5781             stbi__rewind(s);
5782             return 0; // only RGB or grey allowed, +/- RLE
5783         }
5784         stbi__skip(s,9); // skip colormap specification and image x/y origin
5785         tga_colormap_bpp = 0;
5786     }
5787     tga_w = stbi__get16le(s);
5788     if( tga_w < 1 ) {
5789         stbi__rewind(s);
5790         return 0;   // test width
5791     }
5792     tga_h = stbi__get16le(s);
5793     if( tga_h < 1 ) {
5794         stbi__rewind(s);
5795         return 0;   // test height
5796     }
5797     tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5798     stbi__get8(s); // ignore alpha bits
5799     if (tga_colormap_bpp != 0) {
5800         if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5801             // when using a colormap, tga_bits_per_pixel is the size of the indexes
5802             // I don't think anything but 8 or 16bit indexes makes sense
5803             stbi__rewind(s);
5804             return 0;
5805         }
5806         tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5807     } else {
5808         tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5809     }
5810     if(!tga_comp) {
5811       stbi__rewind(s);
5812       return 0;
5813     }
5814     if (x) *x = tga_w;
5815     if (y) *y = tga_h;
5816     if (comp) *comp = tga_comp;
5817     return 1;                   // seems to have passed everything
5818 }
5819
5820 static int stbi__tga_test(stbi__context *s)
5821 {
5822    int res = 0;
5823    int sz, tga_color_type;
5824    stbi__get8(s);      //   discard Offset
5825    tga_color_type = stbi__get8(s);   //   color type
5826    if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
5827    sz = stbi__get8(s);   //   image type
5828    if ( tga_color_type == 1 ) { // colormapped (paletted) image
5829       if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5830       stbi__skip(s,4);       // skip index of first colormap entry and number of entries
5831       sz = stbi__get8(s);    //   check bits per palette color entry
5832       if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5833       stbi__skip(s,4);       // skip image x and y origin
5834    } else { // "normal" image w/o colormap
5835       if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5836       stbi__skip(s,9); // skip colormap specification and image x/y origin
5837    }
5838    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
5839    if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
5840    sz = stbi__get8(s);   //   bits per pixel
5841    if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5842    if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5843
5844    res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5845
5846 errorEnd:
5847    stbi__rewind(s);
5848    return res;
5849 }
5850
5851 // read 16bit value and convert to 24bit RGB
5852 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5853 {
5854    stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5855    stbi__uint16 fiveBitMask = 31;
5856    // we have 3 channels with 5bits each
5857    int r = (px >> 10) & fiveBitMask;
5858    int g = (px >> 5) & fiveBitMask;
5859    int b = px & fiveBitMask;
5860    // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5861    out[0] = (stbi_uc)((r * 255)/31);
5862    out[1] = (stbi_uc)((g * 255)/31);
5863    out[2] = (stbi_uc)((b * 255)/31);
5864
5865    // some people claim that the most significant bit might be used for alpha
5866    // (possibly if an alpha-bit is set in the "image descriptor byte")
5867    // but that only made 16bit test images completely translucent..
5868    // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5869 }
5870
5871 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5872 {
5873    //   read in the TGA header stuff
5874    int tga_offset = stbi__get8(s);
5875    int tga_indexed = stbi__get8(s);
5876    int tga_image_type = stbi__get8(s);
5877    int tga_is_RLE = 0;
5878    int tga_palette_start = stbi__get16le(s);
5879    int tga_palette_len = stbi__get16le(s);
5880    int tga_palette_bits = stbi__get8(s);
5881    int tga_x_origin = stbi__get16le(s);
5882    int tga_y_origin = stbi__get16le(s);
5883    int tga_width = stbi__get16le(s);
5884    int tga_height = stbi__get16le(s);
5885    int tga_bits_per_pixel = stbi__get8(s);
5886    int tga_comp, tga_rgb16=0;
5887    int tga_inverted = stbi__get8(s);
5888    // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5889    //   image data
5890    unsigned char *tga_data;
5891    unsigned char *tga_palette = NULL;
5892    int i, j;
5893    unsigned char raw_data[4] = {0};
5894    int RLE_count = 0;
5895    int RLE_repeating = 0;
5896    int read_next_pixel = 1;
5897    STBI_NOTUSED(ri);
5898    STBI_NOTUSED(tga_x_origin); // @TODO
5899    STBI_NOTUSED(tga_y_origin); // @TODO
5900
5901    if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5902    if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
5903
5904    //   do a tiny bit of precessing
5905    if ( tga_image_type >= 8 )
5906    {
5907       tga_image_type -= 8;
5908       tga_is_RLE = 1;
5909    }
5910    tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5911
5912    //   If I'm paletted, then I'll use the number of bits from the palette
5913    if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5914    else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5915
5916    if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5917       return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5918
5919    //   tga info
5920    *x = tga_width;
5921    *y = tga_height;
5922    if (comp) *comp = tga_comp;
5923
5924    if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5925       return stbi__errpuc("too large", "Corrupt TGA");
5926
5927    tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5928    if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5929
5930    // skip to the data's starting position (offset usually = 0)
5931    stbi__skip(s, tga_offset );
5932
5933    if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5934       for (i=0; i < tga_height; ++i) {
5935          int row = tga_inverted ? tga_height -i - 1 : i;
5936          stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5937          stbi__getn(s, tga_row, tga_width * tga_comp);
5938       }
5939    } else  {
5940       //   do I need to load a palette?
5941       if ( tga_indexed)
5942       {
5943          if (tga_palette_len == 0) {  /* you have to have at least one entry! */
5944             STBI_FREE(tga_data);
5945             return stbi__errpuc("bad palette", "Corrupt TGA");
5946          }
5947
5948          //   any data to skip? (offset usually = 0)
5949          stbi__skip(s, tga_palette_start );
5950          //   load the palette
5951          tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5952          if (!tga_palette) {
5953             STBI_FREE(tga_data);
5954             return stbi__errpuc("outofmem", "Out of memory");
5955          }
5956          if (tga_rgb16) {
5957             stbi_uc *pal_entry = tga_palette;
5958             STBI_ASSERT(tga_comp == STBI_rgb);
5959             for (i=0; i < tga_palette_len; ++i) {
5960                stbi__tga_read_rgb16(s, pal_entry);
5961                pal_entry += tga_comp;
5962             }
5963          } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5964                STBI_FREE(tga_data);
5965                STBI_FREE(tga_palette);
5966                return stbi__errpuc("bad palette", "Corrupt TGA");
5967          }
5968       }
5969       //   load the data
5970       for (i=0; i < tga_width * tga_height; ++i)
5971       {
5972          //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5973          if ( tga_is_RLE )
5974          {
5975             if ( RLE_count == 0 )
5976             {
5977                //   yep, get the next byte as a RLE command
5978                int RLE_cmd = stbi__get8(s);
5979                RLE_count = 1 + (RLE_cmd & 127);
5980                RLE_repeating = RLE_cmd >> 7;
5981                read_next_pixel = 1;
5982             } else if ( !RLE_repeating )
5983             {
5984                read_next_pixel = 1;
5985             }
5986          } else
5987          {
5988             read_next_pixel = 1;
5989          }
5990          //   OK, if I need to read a pixel, do it now
5991          if ( read_next_pixel )
5992          {
5993             //   load however much data we did have
5994             if ( tga_indexed )
5995             {
5996                // read in index, then perform the lookup
5997                int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5998                if ( pal_idx >= tga_palette_len ) {
5999                   // invalid index
6000                   pal_idx = 0;
6001                }
6002                pal_idx *= tga_comp;
6003                for (j = 0; j < tga_comp; ++j) {
6004                   raw_data[j] = tga_palette[pal_idx+j];
6005                }
6006             } else if(tga_rgb16) {
6007                STBI_ASSERT(tga_comp == STBI_rgb);
6008                stbi__tga_read_rgb16(s, raw_data);
6009             } else {
6010                //   read in the data raw
6011                for (j = 0; j < tga_comp; ++j) {
6012                   raw_data[j] = stbi__get8(s);
6013                }
6014             }
6015             //   clear the reading flag for the next pixel
6016             read_next_pixel = 0;
6017          } // end of reading a pixel
6018
6019          // copy data
6020          for (j = 0; j < tga_comp; ++j)
6021            tga_data[i*tga_comp+j] = raw_data[j];
6022
6023          //   in case we're in RLE mode, keep counting down
6024          --RLE_count;
6025       }
6026       //   do I need to invert the image?
6027       if ( tga_inverted )
6028       {
6029          for (j = 0; j*2 < tga_height; ++j)
6030          {
6031             int index1 = j * tga_width * tga_comp;
6032             int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
6033             for (i = tga_width * tga_comp; i > 0; --i)
6034             {
6035                unsigned char temp = tga_data[index1];
6036                tga_data[index1] = tga_data[index2];
6037                tga_data[index2] = temp;
6038                ++index1;
6039                ++index2;
6040             }
6041          }
6042       }
6043       //   clear my palette, if I had one
6044       if ( tga_palette != NULL )
6045       {
6046          STBI_FREE( tga_palette );
6047       }
6048    }
6049
6050    // swap RGB - if the source data was RGB16, it already is in the right order
6051    if (tga_comp >= 3 && !tga_rgb16)
6052    {
6053       unsigned char* tga_pixel = tga_data;
6054       for (i=0; i < tga_width * tga_height; ++i)
6055       {
6056          unsigned char temp = tga_pixel[0];
6057          tga_pixel[0] = tga_pixel[2];
6058          tga_pixel[2] = temp;
6059          tga_pixel += tga_comp;
6060       }
6061    }
6062
6063    // convert to target component count
6064    if (req_comp && req_comp != tga_comp)
6065       tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
6066
6067    //   the things I do to get rid of an error message, and yet keep
6068    //   Microsoft's C compilers happy... [8^(
6069    tga_palette_start = tga_palette_len = tga_palette_bits =
6070          tga_x_origin = tga_y_origin = 0;
6071    STBI_NOTUSED(tga_palette_start);
6072    //   OK, done
6073    return tga_data;
6074 }
6075 #endif
6076
6077 // *************************************************************************************************
6078 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
6079
6080 #ifndef STBI_NO_PSD
6081 static int stbi__psd_test(stbi__context *s)
6082 {
6083    int r = (stbi__get32be(s) == 0x38425053);
6084    stbi__rewind(s);
6085    return r;
6086 }
6087
6088 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
6089 {
6090    int count, nleft, len;
6091
6092    count = 0;
6093    while ((nleft = pixelCount - count) > 0) {
6094       len = stbi__get8(s);
6095       if (len == 128) {
6096          // No-op.
6097       } else if (len < 128) {
6098          // Copy next len+1 bytes literally.
6099          len++;
6100          if (len > nleft) return 0; // corrupt data
6101          count += len;
6102          while (len) {
6103             *p = stbi__get8(s);
6104             p += 4;
6105             len--;
6106          }
6107       } else if (len > 128) {
6108          stbi_uc   val;
6109          // Next -len+1 bytes in the dest are replicated from next source byte.
6110          // (Interpret len as a negative 8-bit int.)
6111          len = 257 - len;
6112          if (len > nleft) return 0; // corrupt data
6113          val = stbi__get8(s);
6114          count += len;
6115          while (len) {
6116             *p = val;
6117             p += 4;
6118             len--;
6119          }
6120       }
6121    }
6122
6123    return 1;
6124 }
6125
6126 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
6127 {
6128    int pixelCount;
6129    int channelCount, compression;
6130    int channel, i;
6131    int bitdepth;
6132    int w,h;
6133    stbi_uc *out;
6134    STBI_NOTUSED(ri);
6135
6136    // Check identifier
6137    if (stbi__get32be(s) != 0x38425053)   // "8BPS"
6138       return stbi__errpuc("not PSD", "Corrupt PSD image");
6139
6140    // Check file type version.
6141    if (stbi__get16be(s) != 1)
6142       return stbi__errpuc("wrong version", "Unsupported version of PSD image");
6143
6144    // Skip 6 reserved bytes.
6145    stbi__skip(s, 6 );
6146
6147    // Read the number of channels (R, G, B, A, etc).
6148    channelCount = stbi__get16be(s);
6149    if (channelCount < 0 || channelCount > 16)
6150       return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
6151
6152    // Read the rows and columns of the image.
6153    h = stbi__get32be(s);
6154    w = stbi__get32be(s);
6155
6156    if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6157    if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6158
6159    // Make sure the depth is 8 bits.
6160    bitdepth = stbi__get16be(s);
6161    if (bitdepth != 8 && bitdepth != 16)
6162       return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
6163
6164    // Make sure the color mode is RGB.
6165    // Valid options are:
6166    //   0: Bitmap
6167    //   1: Grayscale
6168    //   2: Indexed color
6169    //   3: RGB color
6170    //   4: CMYK color
6171    //   7: Multichannel
6172    //   8: Duotone
6173    //   9: Lab color
6174    if (stbi__get16be(s) != 3)
6175       return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
6176
6177    // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
6178    stbi__skip(s,stbi__get32be(s) );
6179
6180    // Skip the image resources.  (resolution, pen tool paths, etc)
6181    stbi__skip(s, stbi__get32be(s) );
6182
6183    // Skip the reserved data.
6184    stbi__skip(s, stbi__get32be(s) );
6185
6186    // Find out if the data is compressed.
6187    // Known values:
6188    //   0: no compression
6189    //   1: RLE compressed
6190    compression = stbi__get16be(s);
6191    if (compression > 1)
6192       return stbi__errpuc("bad compression", "PSD has an unknown compression format");
6193
6194    // Check size
6195    if (!stbi__mad3sizes_valid(4, w, h, 0))
6196       return stbi__errpuc("too large", "Corrupt PSD");
6197
6198    // Create the destination image.
6199
6200    if (!compression && bitdepth == 16 && bpc == 16) {
6201       out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
6202       ri->bits_per_channel = 16;
6203    } else
6204       out = (stbi_uc *) stbi__malloc(4 * w*h);
6205
6206    if (!out) return stbi__errpuc("outofmem", "Out of memory");
6207    pixelCount = w*h;
6208
6209    // Initialize the data to zero.
6210    //memset( out, 0, pixelCount * 4 );
6211
6212    // Finally, the image data.
6213    if (compression) {
6214       // RLE as used by .PSD and .TIFF
6215       // Loop until you get the number of unpacked bytes you are expecting:
6216       //     Read the next source byte into n.
6217       //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
6218       //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
6219       //     Else if n is 128, noop.
6220       // Endloop
6221
6222       // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
6223       // which we're going to just skip.
6224       stbi__skip(s, h * channelCount * 2 );
6225
6226       // Read the RLE data by channel.
6227       for (channel = 0; channel < 4; channel++) {
6228          stbi_uc *p;
6229
6230          p = out+channel;
6231          if (channel >= channelCount) {
6232             // Fill this channel with default data.
6233             for (i = 0; i < pixelCount; i++, p += 4)
6234                *p = (channel == 3 ? 255 : 0);
6235          } else {
6236             // Read the RLE data.
6237             if (!stbi__psd_decode_rle(s, p, pixelCount)) {
6238                STBI_FREE(out);
6239                return stbi__errpuc("corrupt", "bad RLE data");
6240             }
6241          }
6242       }
6243
6244    } else {
6245       // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
6246       // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
6247
6248       // Read the data by channel.
6249       for (channel = 0; channel < 4; channel++) {
6250          if (channel >= channelCount) {
6251             // Fill this channel with default data.
6252             if (bitdepth == 16 && bpc == 16) {
6253                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6254                stbi__uint16 val = channel == 3 ? 65535 : 0;
6255                for (i = 0; i < pixelCount; i++, q += 4)
6256                   *q = val;
6257             } else {
6258                stbi_uc *p = out+channel;
6259                stbi_uc val = channel == 3 ? 255 : 0;
6260                for (i = 0; i < pixelCount; i++, p += 4)
6261                   *p = val;
6262             }
6263          } else {
6264             if (ri->bits_per_channel == 16) {    // output bpc
6265                stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
6266                for (i = 0; i < pixelCount; i++, q += 4)
6267                   *q = (stbi__uint16) stbi__get16be(s);
6268             } else {
6269                stbi_uc *p = out+channel;
6270                if (bitdepth == 16) {  // input bpc
6271                   for (i = 0; i < pixelCount; i++, p += 4)
6272                      *p = (stbi_uc) (stbi__get16be(s) >> 8);
6273                } else {
6274                   for (i = 0; i < pixelCount; i++, p += 4)
6275                      *p = stbi__get8(s);
6276                }
6277             }
6278          }
6279       }
6280    }
6281
6282    // remove weird white matte from PSD
6283    if (channelCount >= 4) {
6284       if (ri->bits_per_channel == 16) {
6285          for (i=0; i < w*h; ++i) {
6286             stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
6287             if (pixel[3] != 0 && pixel[3] != 65535) {
6288                float a = pixel[3] / 65535.0f;
6289                float ra = 1.0f / a;
6290                float inv_a = 65535.0f * (1 - ra);
6291                pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
6292                pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
6293                pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
6294             }
6295          }
6296       } else {
6297          for (i=0; i < w*h; ++i) {
6298             unsigned char *pixel = out + 4*i;
6299             if (pixel[3] != 0 && pixel[3] != 255) {
6300                float a = pixel[3] / 255.0f;
6301                float ra = 1.0f / a;
6302                float inv_a = 255.0f * (1 - ra);
6303                pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
6304                pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
6305                pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
6306             }
6307          }
6308       }
6309    }
6310
6311    // convert to desired output format
6312    if (req_comp && req_comp != 4) {
6313       if (ri->bits_per_channel == 16)
6314          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
6315       else
6316          out = stbi__convert_format(out, 4, req_comp, w, h);
6317       if (out == NULL) return out; // stbi__convert_format frees input on failure
6318    }
6319
6320    if (comp) *comp = 4;
6321    *y = h;
6322    *x = w;
6323
6324    return out;
6325 }
6326 #endif
6327
6328 // *************************************************************************************************
6329 // Softimage PIC loader
6330 // by Tom Seddon
6331 //
6332 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
6333 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
6334
6335 #ifndef STBI_NO_PIC
6336 static int stbi__pic_is4(stbi__context *s,const char *str)
6337 {
6338    int i;
6339    for (i=0; i<4; ++i)
6340       if (stbi__get8(s) != (stbi_uc)str[i])
6341          return 0;
6342
6343    return 1;
6344 }
6345
6346 static int stbi__pic_test_core(stbi__context *s)
6347 {
6348    int i;
6349
6350    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
6351       return 0;
6352
6353    for(i=0;i<84;++i)
6354       stbi__get8(s);
6355
6356    if (!stbi__pic_is4(s,"PICT"))
6357       return 0;
6358
6359    return 1;
6360 }
6361
6362 typedef struct
6363 {
6364    stbi_uc size,type,channel;
6365 } stbi__pic_packet;
6366
6367 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
6368 {
6369    int mask=0x80, i;
6370
6371    for (i=0; i<4; ++i, mask>>=1) {
6372       if (channel & mask) {
6373          if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
6374          dest[i]=stbi__get8(s);
6375       }
6376    }
6377
6378    return dest;
6379 }
6380
6381 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
6382 {
6383    int mask=0x80,i;
6384
6385    for (i=0;i<4; ++i, mask>>=1)
6386       if (channel&mask)
6387          dest[i]=src[i];
6388 }
6389
6390 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
6391 {
6392    int act_comp=0,num_packets=0,y,chained;
6393    stbi__pic_packet packets[10];
6394
6395    // this will (should...) cater for even some bizarre stuff like having data
6396     // for the same channel in multiple packets.
6397    do {
6398       stbi__pic_packet *packet;
6399
6400       if (num_packets==sizeof(packets)/sizeof(packets[0]))
6401          return stbi__errpuc("bad format","too many packets");
6402
6403       packet = &packets[num_packets++];
6404
6405       chained = stbi__get8(s);
6406       packet->size    = stbi__get8(s);
6407       packet->type    = stbi__get8(s);
6408       packet->channel = stbi__get8(s);
6409
6410       act_comp |= packet->channel;
6411
6412       if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
6413       if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
6414    } while (chained);
6415
6416    *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
6417
6418    for(y=0; y<height; ++y) {
6419       int packet_idx;
6420
6421       for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
6422          stbi__pic_packet *packet = &packets[packet_idx];
6423          stbi_uc *dest = result+y*width*4;
6424
6425          switch (packet->type) {
6426             default:
6427                return stbi__errpuc("bad format","packet has bad compression type");
6428
6429             case 0: {//uncompressed
6430                int x;
6431
6432                for(x=0;x<width;++x, dest+=4)
6433                   if (!stbi__readval(s,packet->channel,dest))
6434                      return 0;
6435                break;
6436             }
6437
6438             case 1://Pure RLE
6439                {
6440                   int left=width, i;
6441
6442                   while (left>0) {
6443                      stbi_uc count,value[4];
6444
6445                      count=stbi__get8(s);
6446                      if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
6447
6448                      if (count > left)
6449                         count = (stbi_uc) left;
6450
6451                      if (!stbi__readval(s,packet->channel,value))  return 0;
6452
6453                      for(i=0; i<count; ++i,dest+=4)
6454                         stbi__copyval(packet->channel,dest,value);
6455                      left -= count;
6456                   }
6457                }
6458                break;
6459
6460             case 2: {//Mixed RLE
6461                int left=width;
6462                while (left>0) {
6463                   int count = stbi__get8(s), i;
6464                   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
6465
6466                   if (count >= 128) { // Repeated
6467                      stbi_uc value[4];
6468
6469                      if (count==128)
6470                         count = stbi__get16be(s);
6471                      else
6472                         count -= 127;
6473                      if (count > left)
6474                         return stbi__errpuc("bad file","scanline overrun");
6475
6476                      if (!stbi__readval(s,packet->channel,value))
6477                         return 0;
6478
6479                      for(i=0;i<count;++i, dest += 4)
6480                         stbi__copyval(packet->channel,dest,value);
6481                   } else { // Raw
6482                      ++count;
6483                      if (count>left) return stbi__errpuc("bad file","scanline overrun");
6484
6485                      for(i=0;i<count;++i, dest+=4)
6486                         if (!stbi__readval(s,packet->channel,dest))
6487                            return 0;
6488                   }
6489                   left-=count;
6490                }
6491                break;
6492             }
6493          }
6494       }
6495    }
6496
6497    return result;
6498 }
6499
6500 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
6501 {
6502    stbi_uc *result;
6503    int i, x,y, internal_comp;
6504    STBI_NOTUSED(ri);
6505
6506    if (!comp) comp = &internal_comp;
6507
6508    for (i=0; i<92; ++i)
6509       stbi__get8(s);
6510
6511    x = stbi__get16be(s);
6512    y = stbi__get16be(s);
6513
6514    if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6515    if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
6516
6517    if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
6518    if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
6519
6520    stbi__get32be(s); //skip `ratio'
6521    stbi__get16be(s); //skip `fields'
6522    stbi__get16be(s); //skip `pad'
6523
6524    // intermediate buffer is RGBA
6525    result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
6526    if (!result) return stbi__errpuc("outofmem", "Out of memory");
6527    memset(result, 0xff, x*y*4);
6528
6529    if (!stbi__pic_load_core(s,x,y,comp, result)) {
6530       STBI_FREE(result);
6531       result=0;
6532    }
6533    *px = x;
6534    *py = y;
6535    if (req_comp == 0) req_comp = *comp;
6536    result=stbi__convert_format(result,4,req_comp,x,y);
6537
6538    return result;
6539 }
6540
6541 static int stbi__pic_test(stbi__context *s)
6542 {
6543    int r = stbi__pic_test_core(s);
6544    stbi__rewind(s);
6545    return r;
6546 }
6547 #endif
6548
6549 // *************************************************************************************************
6550 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6551
6552 #ifndef STBI_NO_GIF
6553 typedef struct
6554 {
6555    stbi__int16 prefix;
6556    stbi_uc first;
6557    stbi_uc suffix;
6558 } stbi__gif_lzw;
6559
6560 typedef struct
6561 {
6562    int w,h;
6563    stbi_uc *out;                 // output buffer (always 4 components)
6564    stbi_uc *background;          // The current "background" as far as a gif is concerned
6565    stbi_uc *history;
6566    int flags, bgindex, ratio, transparent, eflags;
6567    stbi_uc  pal[256][4];
6568    stbi_uc lpal[256][4];
6569    stbi__gif_lzw codes[8192];
6570    stbi_uc *color_table;
6571    int parse, step;
6572    int lflags;
6573    int start_x, start_y;
6574    int max_x, max_y;
6575    int cur_x, cur_y;
6576    int line_size;
6577    int delay;
6578 } stbi__gif;
6579
6580 static int stbi__gif_test_raw(stbi__context *s)
6581 {
6582    int sz;
6583    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6584    sz = stbi__get8(s);
6585    if (sz != '9' && sz != '7') return 0;
6586    if (stbi__get8(s) != 'a') return 0;
6587    return 1;
6588 }
6589
6590 static int stbi__gif_test(stbi__context *s)
6591 {
6592    int r = stbi__gif_test_raw(s);
6593    stbi__rewind(s);
6594    return r;
6595 }
6596
6597 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6598 {
6599    int i;
6600    for (i=0; i < num_entries; ++i) {
6601       pal[i][2] = stbi__get8(s);
6602       pal[i][1] = stbi__get8(s);
6603       pal[i][0] = stbi__get8(s);
6604       pal[i][3] = transp == i ? 0 : 255;
6605    }
6606 }
6607
6608 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6609 {
6610    stbi_uc version;
6611    if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6612       return stbi__err("not GIF", "Corrupt GIF");
6613
6614    version = stbi__get8(s);
6615    if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
6616    if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
6617
6618    stbi__g_failure_reason = "";
6619    g->w = stbi__get16le(s);
6620    g->h = stbi__get16le(s);
6621    g->flags = stbi__get8(s);
6622    g->bgindex = stbi__get8(s);
6623    g->ratio = stbi__get8(s);
6624    g->transparent = -1;
6625
6626    if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6627    if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
6628
6629    if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
6630
6631    if (is_info) return 1;
6632
6633    if (g->flags & 0x80)
6634       stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
6635
6636    return 1;
6637 }
6638
6639 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6640 {
6641    stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6642    if (!g) return stbi__err("outofmem", "Out of memory");
6643    if (!stbi__gif_header(s, g, comp, 1)) {
6644       STBI_FREE(g);
6645       stbi__rewind( s );
6646       return 0;
6647    }
6648    if (x) *x = g->w;
6649    if (y) *y = g->h;
6650    STBI_FREE(g);
6651    return 1;
6652 }
6653
6654 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6655 {
6656    stbi_uc *p, *c;
6657    int idx;
6658
6659    // recurse to decode the prefixes, since the linked-list is backwards,
6660    // and working backwards through an interleaved image would be nasty
6661    if (g->codes[code].prefix >= 0)
6662       stbi__out_gif_code(g, g->codes[code].prefix);
6663
6664    if (g->cur_y >= g->max_y) return;
6665
6666    idx = g->cur_x + g->cur_y;
6667    p = &g->out[idx];
6668    g->history[idx / 4] = 1;
6669
6670    c = &g->color_table[g->codes[code].suffix * 4];
6671    if (c[3] > 128) { // don't render transparent pixels;
6672       p[0] = c[2];
6673       p[1] = c[1];
6674       p[2] = c[0];
6675       p[3] = c[3];
6676    }
6677    g->cur_x += 4;
6678
6679    if (g->cur_x >= g->max_x) {
6680       g->cur_x = g->start_x;
6681       g->cur_y += g->step;
6682
6683       while (g->cur_y >= g->max_y && g->parse > 0) {
6684          g->step = (1 << g->parse) * g->line_size;
6685          g->cur_y = g->start_y + (g->step >> 1);
6686          --g->parse;
6687       }
6688    }
6689 }
6690
6691 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6692 {
6693    stbi_uc lzw_cs;
6694    stbi__int32 len, init_code;
6695    stbi__uint32 first;
6696    stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6697    stbi__gif_lzw *p;
6698
6699    lzw_cs = stbi__get8(s);
6700    if (lzw_cs > 12) return NULL;
6701    clear = 1 << lzw_cs;
6702    first = 1;
6703    codesize = lzw_cs + 1;
6704    codemask = (1 << codesize) - 1;
6705    bits = 0;
6706    valid_bits = 0;
6707    for (init_code = 0; init_code < clear; init_code++) {
6708       g->codes[init_code].prefix = -1;
6709       g->codes[init_code].first = (stbi_uc) init_code;
6710       g->codes[init_code].suffix = (stbi_uc) init_code;
6711    }
6712
6713    // support no starting clear code
6714    avail = clear+2;
6715    oldcode = -1;
6716
6717    len = 0;
6718    for(;;) {
6719       if (valid_bits < codesize) {
6720          if (len == 0) {
6721             len = stbi__get8(s); // start new block
6722             if (len == 0)
6723                return g->out;
6724          }
6725          --len;
6726          bits |= (stbi__int32) stbi__get8(s) << valid_bits;
6727          valid_bits += 8;
6728       } else {
6729          stbi__int32 code = bits & codemask;
6730          bits >>= codesize;
6731          valid_bits -= codesize;
6732          // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6733          if (code == clear) {  // clear code
6734             codesize = lzw_cs + 1;
6735             codemask = (1 << codesize) - 1;
6736             avail = clear + 2;
6737             oldcode = -1;
6738             first = 0;
6739          } else if (code == clear + 1) { // end of stream code
6740             stbi__skip(s, len);
6741             while ((len = stbi__get8(s)) > 0)
6742                stbi__skip(s,len);
6743             return g->out;
6744          } else if (code <= avail) {
6745             if (first) {
6746                return stbi__errpuc("no clear code", "Corrupt GIF");
6747             }
6748
6749             if (oldcode >= 0) {
6750                p = &g->codes[avail++];
6751                if (avail > 8192) {
6752                   return stbi__errpuc("too many codes", "Corrupt GIF");
6753                }
6754
6755                p->prefix = (stbi__int16) oldcode;
6756                p->first = g->codes[oldcode].first;
6757                p->suffix = (code == avail) ? p->first : g->codes[code].first;
6758             } else if (code == avail)
6759                return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6760
6761             stbi__out_gif_code(g, (stbi__uint16) code);
6762
6763             if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6764                codesize++;
6765                codemask = (1 << codesize) - 1;
6766             }
6767
6768             oldcode = code;
6769          } else {
6770             return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6771          }
6772       }
6773    }
6774 }
6775
6776 // this function is designed to support animated gifs, although stb_image doesn't support it
6777 // two back is the image from two frames ago, used for a very specific disposal format
6778 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
6779 {
6780    int dispose;
6781    int first_frame;
6782    int pi;
6783    int pcount;
6784    STBI_NOTUSED(req_comp);
6785
6786    // on first frame, any non-written pixels get the background colour (non-transparent)
6787    first_frame = 0;
6788    if (g->out == 0) {
6789       if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
6790       if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
6791          return stbi__errpuc("too large", "GIF image is too large");
6792       pcount = g->w * g->h;
6793       g->out = (stbi_uc *) stbi__malloc(4 * pcount);
6794       g->background = (stbi_uc *) stbi__malloc(4 * pcount);
6795       g->history = (stbi_uc *) stbi__malloc(pcount);
6796       if (!g->out || !g->background || !g->history)
6797          return stbi__errpuc("outofmem", "Out of memory");
6798
6799       // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
6800       // background colour is only used for pixels that are not rendered first frame, after that "background"
6801       // color refers to the color that was there the previous frame.
6802       memset(g->out, 0x00, 4 * pcount);
6803       memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
6804       memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
6805       first_frame = 1;
6806    } else {
6807       // second frame - how do we dispose of the previous one?
6808       dispose = (g->eflags & 0x1C) >> 2;
6809       pcount = g->w * g->h;
6810
6811       if ((dispose == 3) && (two_back == 0)) {
6812          dispose = 2; // if I don't have an image to revert back to, default to the old background
6813       }
6814
6815       if (dispose == 3) { // use previous graphic
6816          for (pi = 0; pi < pcount; ++pi) {
6817             if (g->history[pi]) {
6818                memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
6819             }
6820          }
6821       } else if (dispose == 2) {
6822          // restore what was changed last frame to background before that frame;
6823          for (pi = 0; pi < pcount; ++pi) {
6824             if (g->history[pi]) {
6825                memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
6826             }
6827          }
6828       } else {
6829          // This is a non-disposal case eithe way, so just
6830          // leave the pixels as is, and they will become the new background
6831          // 1: do not dispose
6832          // 0:  not specified.
6833       }
6834
6835       // background is what out is after the undoing of the previou frame;
6836       memcpy( g->background, g->out, 4 * g->w * g->h );
6837    }
6838
6839    // clear my history;
6840    memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
6841
6842    for (;;) {
6843       int tag = stbi__get8(s);
6844       switch (tag) {
6845          case 0x2C: /* Image Descriptor */
6846          {
6847             stbi__int32 x, y, w, h;
6848             stbi_uc *o;
6849
6850             x = stbi__get16le(s);
6851             y = stbi__get16le(s);
6852             w = stbi__get16le(s);
6853             h = stbi__get16le(s);
6854             if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6855                return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6856
6857             g->line_size = g->w * 4;
6858             g->start_x = x * 4;
6859             g->start_y = y * g->line_size;
6860             g->max_x   = g->start_x + w * 4;
6861             g->max_y   = g->start_y + h * g->line_size;
6862             g->cur_x   = g->start_x;
6863             g->cur_y   = g->start_y;
6864
6865             // if the width of the specified rectangle is 0, that means
6866             // we may not see *any* pixels or the image is malformed;
6867             // to make sure this is caught, move the current y down to
6868             // max_y (which is what out_gif_code checks).
6869             if (w == 0)
6870                g->cur_y = g->max_y;
6871
6872             g->lflags = stbi__get8(s);
6873
6874             if (g->lflags & 0x40) {
6875                g->step = 8 * g->line_size; // first interlaced spacing
6876                g->parse = 3;
6877             } else {
6878                g->step = g->line_size;
6879                g->parse = 0;
6880             }
6881
6882             if (g->lflags & 0x80) {
6883                stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6884                g->color_table = (stbi_uc *) g->lpal;
6885             } else if (g->flags & 0x80) {
6886                g->color_table = (stbi_uc *) g->pal;
6887             } else
6888                return stbi__errpuc("missing color table", "Corrupt GIF");
6889
6890             o = stbi__process_gif_raster(s, g);
6891             if (!o) return NULL;
6892
6893             // if this was the first frame,
6894             pcount = g->w * g->h;
6895             if (first_frame && (g->bgindex > 0)) {
6896                // if first frame, any pixel not drawn to gets the background color
6897                for (pi = 0; pi < pcount; ++pi) {
6898                   if (g->history[pi] == 0) {
6899                      g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
6900                      memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
6901                   }
6902                }
6903             }
6904
6905             return o;
6906          }
6907
6908          case 0x21: // Comment Extension.
6909          {
6910             int len;
6911             int ext = stbi__get8(s);
6912             if (ext == 0xF9) { // Graphic Control Extension.
6913                len = stbi__get8(s);
6914                if (len == 4) {
6915                   g->eflags = stbi__get8(s);
6916                   g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
6917
6918                   // unset old transparent
6919                   if (g->transparent >= 0) {
6920                      g->pal[g->transparent][3] = 255;
6921                   }
6922                   if (g->eflags & 0x01) {
6923                      g->transparent = stbi__get8(s);
6924                      if (g->transparent >= 0) {
6925                         g->pal[g->transparent][3] = 0;
6926                      }
6927                   } else {
6928                      // don't need transparent
6929                      stbi__skip(s, 1);
6930                      g->transparent = -1;
6931                   }
6932                } else {
6933                   stbi__skip(s, len);
6934                   break;
6935                }
6936             }
6937             while ((len = stbi__get8(s)) != 0) {
6938                stbi__skip(s, len);
6939             }
6940             break;
6941          }
6942
6943          case 0x3B: // gif stream termination code
6944             return (stbi_uc *) s; // using '1' causes warning on some compilers
6945
6946          default:
6947             return stbi__errpuc("unknown code", "Corrupt GIF");
6948       }
6949    }
6950 }
6951
6952 static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
6953 {
6954    STBI_FREE(g->out);
6955    STBI_FREE(g->history);
6956    STBI_FREE(g->background);
6957
6958    if (out) STBI_FREE(out);
6959    if (delays && *delays) STBI_FREE(*delays);
6960    return stbi__errpuc("outofmem", "Out of memory");
6961 }
6962
6963 static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
6964 {
6965    if (stbi__gif_test(s)) {
6966       int layers = 0;
6967       stbi_uc *u = 0;
6968       stbi_uc *out = 0;
6969       stbi_uc *two_back = 0;
6970       stbi__gif g;
6971       int stride;
6972       int out_size = 0;
6973       int delays_size = 0;
6974
6975       STBI_NOTUSED(out_size);
6976       STBI_NOTUSED(delays_size);
6977
6978       memset(&g, 0, sizeof(g));
6979       if (delays) {
6980          *delays = 0;
6981       }
6982
6983       do {
6984          u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
6985          if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
6986
6987          if (u) {
6988             *x = g.w;
6989             *y = g.h;
6990             ++layers;
6991             stride = g.w * g.h * 4;
6992
6993             if (out) {
6994                void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
6995                if (!tmp)
6996                   return stbi__load_gif_main_outofmem(&g, out, delays);
6997                else {
6998                    out = (stbi_uc*) tmp;
6999                    out_size = layers * stride;
7000                }
7001
7002                if (delays) {
7003                   int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
7004                   if (!new_delays)
7005                      return stbi__load_gif_main_outofmem(&g, out, delays);
7006                   *delays = new_delays;
7007                   delays_size = layers * sizeof(int);
7008                }
7009             } else {
7010                out = (stbi_uc*)stbi__malloc( layers * stride );
7011                if (!out)
7012                   return stbi__load_gif_main_outofmem(&g, out, delays);
7013                out_size = layers * stride;
7014                if (delays) {
7015                   *delays = (int*) stbi__malloc( layers * sizeof(int) );
7016                   if (!*delays)
7017                      return stbi__load_gif_main_outofmem(&g, out, delays);
7018                   delays_size = layers * sizeof(int);
7019                }
7020             }
7021             memcpy( out + ((layers - 1) * stride), u, stride );
7022             if (layers >= 2) {
7023                two_back = out - 2 * stride;
7024             }
7025
7026             if (delays) {
7027                (*delays)[layers - 1U] = g.delay;
7028             }
7029          }
7030       } while (u != 0);
7031
7032       // free temp buffer;
7033       STBI_FREE(g.out);
7034       STBI_FREE(g.history);
7035       STBI_FREE(g.background);
7036
7037       // do the final conversion after loading everything;
7038       if (req_comp && req_comp != 4)
7039          out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
7040
7041       *z = layers;
7042       return out;
7043    } else {
7044       return stbi__errpuc("not GIF", "Image was not as a gif type.");
7045    }
7046 }
7047
7048 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7049 {
7050    stbi_uc *u = 0;
7051    stbi__gif g;
7052    memset(&g, 0, sizeof(g));
7053    STBI_NOTUSED(ri);
7054
7055    u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
7056    if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
7057    if (u) {
7058       *x = g.w;
7059       *y = g.h;
7060
7061       // moved conversion to after successful load so that the same
7062       // can be done for multiple frames.
7063       if (req_comp && req_comp != 4)
7064          u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
7065    } else if (g.out) {
7066       // if there was an error and we allocated an image buffer, free it!
7067       STBI_FREE(g.out);
7068    }
7069
7070    // free buffers needed for multiple frame loading;
7071    STBI_FREE(g.history);
7072    STBI_FREE(g.background);
7073
7074    return u;
7075 }
7076
7077 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
7078 {
7079    return stbi__gif_info_raw(s,x,y,comp);
7080 }
7081 #endif
7082
7083 // *************************************************************************************************
7084 // Radiance RGBE HDR loader
7085 // originally by Nicolas Schulz
7086 #ifndef STBI_NO_HDR
7087 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
7088 {
7089    int i;
7090    for (i=0; signature[i]; ++i)
7091       if (stbi__get8(s) != signature[i])
7092           return 0;
7093    stbi__rewind(s);
7094    return 1;
7095 }
7096
7097 static int stbi__hdr_test(stbi__context* s)
7098 {
7099    int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
7100    stbi__rewind(s);
7101    if(!r) {
7102        r = stbi__hdr_test_core(s, "#?RGBE\n");
7103        stbi__rewind(s);
7104    }
7105    return r;
7106 }
7107
7108 #define STBI__HDR_BUFLEN  1024
7109 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
7110 {
7111    int len=0;
7112    char c = '\0';
7113
7114    c = (char) stbi__get8(z);
7115
7116    while (!stbi__at_eof(z) && c != '\n') {
7117       buffer[len++] = c;
7118       if (len == STBI__HDR_BUFLEN-1) {
7119          // flush to end of line
7120          while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
7121             ;
7122          break;
7123       }
7124       c = (char) stbi__get8(z);
7125    }
7126
7127    buffer[len] = 0;
7128    return buffer;
7129 }
7130
7131 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
7132 {
7133    if ( input[3] != 0 ) {
7134       float f1;
7135       // Exponent
7136       f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
7137       if (req_comp <= 2)
7138          output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
7139       else {
7140          output[0] = input[0] * f1;
7141          output[1] = input[1] * f1;
7142          output[2] = input[2] * f1;
7143       }
7144       if (req_comp == 2) output[1] = 1;
7145       if (req_comp == 4) output[3] = 1;
7146    } else {
7147       switch (req_comp) {
7148          case 4: output[3] = 1; /* fallthrough */
7149          case 3: output[0] = output[1] = output[2] = 0;
7150                  break;
7151          case 2: output[1] = 1; /* fallthrough */
7152          case 1: output[0] = 0;
7153                  break;
7154       }
7155    }
7156 }
7157
7158 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7159 {
7160    char buffer[STBI__HDR_BUFLEN];
7161    char *token;
7162    int valid = 0;
7163    int width, height;
7164    stbi_uc *scanline;
7165    float *hdr_data;
7166    int len;
7167    unsigned char count, value;
7168    int i, j, k, c1,c2, z;
7169    const char *headerToken;
7170    STBI_NOTUSED(ri);
7171
7172    // Check identifier
7173    headerToken = stbi__hdr_gettoken(s,buffer);
7174    if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
7175       return stbi__errpf("not HDR", "Corrupt HDR image");
7176
7177    // Parse header
7178    for(;;) {
7179       token = stbi__hdr_gettoken(s,buffer);
7180       if (token[0] == 0) break;
7181       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7182    }
7183
7184    if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
7185
7186    // Parse width and height
7187    // can't use sscanf() if we're not using stdio!
7188    token = stbi__hdr_gettoken(s,buffer);
7189    if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7190    token += 3;
7191    height = (int) strtol(token, &token, 10);
7192    while (*token == ' ') ++token;
7193    if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
7194    token += 3;
7195    width = (int) strtol(token, NULL, 10);
7196
7197    if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7198    if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
7199
7200    *x = width;
7201    *y = height;
7202
7203    if (comp) *comp = 3;
7204    if (req_comp == 0) req_comp = 3;
7205
7206    if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
7207       return stbi__errpf("too large", "HDR image is too large");
7208
7209    // Read data
7210    hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
7211    if (!hdr_data)
7212       return stbi__errpf("outofmem", "Out of memory");
7213
7214    // Load image data
7215    // image data is stored as some number of sca
7216    if ( width < 8 || width >= 32768) {
7217       // Read flat data
7218       for (j=0; j < height; ++j) {
7219          for (i=0; i < width; ++i) {
7220             stbi_uc rgbe[4];
7221            main_decode_loop:
7222             stbi__getn(s, rgbe, 4);
7223             stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
7224          }
7225       }
7226    } else {
7227       // Read RLE-encoded data
7228       scanline = NULL;
7229
7230       for (j = 0; j < height; ++j) {
7231          c1 = stbi__get8(s);
7232          c2 = stbi__get8(s);
7233          len = stbi__get8(s);
7234          if (c1 != 2 || c2 != 2 || (len & 0x80)) {
7235             // not run-length encoded, so we have to actually use THIS data as a decoded
7236             // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
7237             stbi_uc rgbe[4];
7238             rgbe[0] = (stbi_uc) c1;
7239             rgbe[1] = (stbi_uc) c2;
7240             rgbe[2] = (stbi_uc) len;
7241             rgbe[3] = (stbi_uc) stbi__get8(s);
7242             stbi__hdr_convert(hdr_data, rgbe, req_comp);
7243             i = 1;
7244             j = 0;
7245             STBI_FREE(scanline);
7246             goto main_decode_loop; // yes, this makes no sense
7247          }
7248          len <<= 8;
7249          len |= stbi__get8(s);
7250          if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
7251          if (scanline == NULL) {
7252             scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
7253             if (!scanline) {
7254                STBI_FREE(hdr_data);
7255                return stbi__errpf("outofmem", "Out of memory");
7256             }
7257          }
7258
7259          for (k = 0; k < 4; ++k) {
7260             int nleft;
7261             i = 0;
7262             while ((nleft = width - i) > 0) {
7263                count = stbi__get8(s);
7264                if (count > 128) {
7265                   // Run
7266                   value = stbi__get8(s);
7267                   count -= 128;
7268                   if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7269                   for (z = 0; z < count; ++z)
7270                      scanline[i++ * 4 + k] = value;
7271                } else {
7272                   // Dump
7273                   if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
7274                   for (z = 0; z < count; ++z)
7275                      scanline[i++ * 4 + k] = stbi__get8(s);
7276                }
7277             }
7278          }
7279          for (i=0; i < width; ++i)
7280             stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
7281       }
7282       if (scanline)
7283          STBI_FREE(scanline);
7284    }
7285
7286    return hdr_data;
7287 }
7288
7289 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
7290 {
7291    char buffer[STBI__HDR_BUFLEN];
7292    char *token;
7293    int valid = 0;
7294    int dummy;
7295
7296    if (!x) x = &dummy;
7297    if (!y) y = &dummy;
7298    if (!comp) comp = &dummy;
7299
7300    if (stbi__hdr_test(s) == 0) {
7301        stbi__rewind( s );
7302        return 0;
7303    }
7304
7305    for(;;) {
7306       token = stbi__hdr_gettoken(s,buffer);
7307       if (token[0] == 0) break;
7308       if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
7309    }
7310
7311    if (!valid) {
7312        stbi__rewind( s );
7313        return 0;
7314    }
7315    token = stbi__hdr_gettoken(s,buffer);
7316    if (strncmp(token, "-Y ", 3)) {
7317        stbi__rewind( s );
7318        return 0;
7319    }
7320    token += 3;
7321    *y = (int) strtol(token, &token, 10);
7322    while (*token == ' ') ++token;
7323    if (strncmp(token, "+X ", 3)) {
7324        stbi__rewind( s );
7325        return 0;
7326    }
7327    token += 3;
7328    *x = (int) strtol(token, NULL, 10);
7329    *comp = 3;
7330    return 1;
7331 }
7332 #endif // STBI_NO_HDR
7333
7334 #ifndef STBI_NO_BMP
7335 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
7336 {
7337    void *p;
7338    stbi__bmp_data info;
7339
7340    info.all_a = 255;
7341    p = stbi__bmp_parse_header(s, &info);
7342    if (p == NULL) {
7343       stbi__rewind( s );
7344       return 0;
7345    }
7346    if (x) *x = s->img_x;
7347    if (y) *y = s->img_y;
7348    if (comp) {
7349       if (info.bpp == 24 && info.ma == 0xff000000)
7350          *comp = 3;
7351       else
7352          *comp = info.ma ? 4 : 3;
7353    }
7354    return 1;
7355 }
7356 #endif
7357
7358 #ifndef STBI_NO_PSD
7359 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
7360 {
7361    int channelCount, dummy, depth;
7362    if (!x) x = &dummy;
7363    if (!y) y = &dummy;
7364    if (!comp) comp = &dummy;
7365    if (stbi__get32be(s) != 0x38425053) {
7366        stbi__rewind( s );
7367        return 0;
7368    }
7369    if (stbi__get16be(s) != 1) {
7370        stbi__rewind( s );
7371        return 0;
7372    }
7373    stbi__skip(s, 6);
7374    channelCount = stbi__get16be(s);
7375    if (channelCount < 0 || channelCount > 16) {
7376        stbi__rewind( s );
7377        return 0;
7378    }
7379    *y = stbi__get32be(s);
7380    *x = stbi__get32be(s);
7381    depth = stbi__get16be(s);
7382    if (depth != 8 && depth != 16) {
7383        stbi__rewind( s );
7384        return 0;
7385    }
7386    if (stbi__get16be(s) != 3) {
7387        stbi__rewind( s );
7388        return 0;
7389    }
7390    *comp = 4;
7391    return 1;
7392 }
7393
7394 static int stbi__psd_is16(stbi__context *s)
7395 {
7396    int channelCount, depth;
7397    if (stbi__get32be(s) != 0x38425053) {
7398        stbi__rewind( s );
7399        return 0;
7400    }
7401    if (stbi__get16be(s) != 1) {
7402        stbi__rewind( s );
7403        return 0;
7404    }
7405    stbi__skip(s, 6);
7406    channelCount = stbi__get16be(s);
7407    if (channelCount < 0 || channelCount > 16) {
7408        stbi__rewind( s );
7409        return 0;
7410    }
7411    STBI_NOTUSED(stbi__get32be(s));
7412    STBI_NOTUSED(stbi__get32be(s));
7413    depth = stbi__get16be(s);
7414    if (depth != 16) {
7415        stbi__rewind( s );
7416        return 0;
7417    }
7418    return 1;
7419 }
7420 #endif
7421
7422 #ifndef STBI_NO_PIC
7423 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
7424 {
7425    int act_comp=0,num_packets=0,chained,dummy;
7426    stbi__pic_packet packets[10];
7427
7428    if (!x) x = &dummy;
7429    if (!y) y = &dummy;
7430    if (!comp) comp = &dummy;
7431
7432    if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
7433       stbi__rewind(s);
7434       return 0;
7435    }
7436
7437    stbi__skip(s, 88);
7438
7439    *x = stbi__get16be(s);
7440    *y = stbi__get16be(s);
7441    if (stbi__at_eof(s)) {
7442       stbi__rewind( s);
7443       return 0;
7444    }
7445    if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
7446       stbi__rewind( s );
7447       return 0;
7448    }
7449
7450    stbi__skip(s, 8);
7451
7452    do {
7453       stbi__pic_packet *packet;
7454
7455       if (num_packets==sizeof(packets)/sizeof(packets[0]))
7456          return 0;
7457
7458       packet = &packets[num_packets++];
7459       chained = stbi__get8(s);
7460       packet->size    = stbi__get8(s);
7461       packet->type    = stbi__get8(s);
7462       packet->channel = stbi__get8(s);
7463       act_comp |= packet->channel;
7464
7465       if (stbi__at_eof(s)) {
7466           stbi__rewind( s );
7467           return 0;
7468       }
7469       if (packet->size != 8) {
7470           stbi__rewind( s );
7471           return 0;
7472       }
7473    } while (chained);
7474
7475    *comp = (act_comp & 0x10 ? 4 : 3);
7476
7477    return 1;
7478 }
7479 #endif
7480
7481 // *************************************************************************************************
7482 // Portable Gray Map and Portable Pixel Map loader
7483 // by Ken Miller
7484 //
7485 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
7486 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
7487 //
7488 // Known limitations:
7489 //    Does not support comments in the header section
7490 //    Does not support ASCII image data (formats P2 and P3)
7491
7492 #ifndef STBI_NO_PNM
7493
7494 static int      stbi__pnm_test(stbi__context *s)
7495 {
7496    char p, t;
7497    p = (char) stbi__get8(s);
7498    t = (char) stbi__get8(s);
7499    if (p != 'P' || (t != '5' && t != '6')) {
7500        stbi__rewind( s );
7501        return 0;
7502    }
7503    return 1;
7504 }
7505
7506 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
7507 {
7508    stbi_uc *out;
7509    STBI_NOTUSED(ri);
7510
7511    ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
7512    if (ri->bits_per_channel == 0)
7513       return 0;
7514
7515    if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7516    if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
7517
7518    *x = s->img_x;
7519    *y = s->img_y;
7520    if (comp) *comp = s->img_n;
7521
7522    if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
7523       return stbi__errpuc("too large", "PNM too large");
7524
7525    out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
7526    if (!out) return stbi__errpuc("outofmem", "Out of memory");
7527    if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) {
7528       STBI_FREE(out);
7529       return stbi__errpuc("bad PNM", "PNM file truncated");
7530    }
7531
7532    if (req_comp && req_comp != s->img_n) {
7533       if (ri->bits_per_channel == 16) {
7534          out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y);
7535       } else {
7536          out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
7537       }
7538       if (out == NULL) return out; // stbi__convert_format frees input on failure
7539    }
7540    return out;
7541 }
7542
7543 static int      stbi__pnm_isspace(char c)
7544 {
7545    return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
7546 }
7547
7548 static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
7549 {
7550    for (;;) {
7551       while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
7552          *c = (char) stbi__get8(s);
7553
7554       if (stbi__at_eof(s) || *c != '#')
7555          break;
7556
7557       while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
7558          *c = (char) stbi__get8(s);
7559    }
7560 }
7561
7562 static int      stbi__pnm_isdigit(char c)
7563 {
7564    return c >= '0' && c <= '9';
7565 }
7566
7567 static int      stbi__pnm_getinteger(stbi__context *s, char *c)
7568 {
7569    int value = 0;
7570
7571    while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
7572       value = value*10 + (*c - '0');
7573       *c = (char) stbi__get8(s);
7574       if((value > 214748364) || (value == 214748364 && *c > '7'))
7575           return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int");
7576    }
7577
7578    return value;
7579 }
7580
7581 static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
7582 {
7583    int maxv, dummy;
7584    char c, p, t;
7585
7586    if (!x) x = &dummy;
7587    if (!y) y = &dummy;
7588    if (!comp) comp = &dummy;
7589
7590    stbi__rewind(s);
7591
7592    // Get identifier
7593    p = (char) stbi__get8(s);
7594    t = (char) stbi__get8(s);
7595    if (p != 'P' || (t != '5' && t != '6')) {
7596        stbi__rewind(s);
7597        return 0;
7598    }
7599
7600    *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
7601
7602    c = (char) stbi__get8(s);
7603    stbi__pnm_skip_whitespace(s, &c);
7604
7605    *x = stbi__pnm_getinteger(s, &c); // read width
7606    if(*x == 0)
7607        return stbi__err("invalid width", "PPM image header had zero or overflowing width");
7608    stbi__pnm_skip_whitespace(s, &c);
7609
7610    *y = stbi__pnm_getinteger(s, &c); // read height
7611    if (*y == 0)
7612        return stbi__err("invalid width", "PPM image header had zero or overflowing width");
7613    stbi__pnm_skip_whitespace(s, &c);
7614
7615    maxv = stbi__pnm_getinteger(s, &c);  // read max value
7616    if (maxv > 65535)
7617       return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
7618    else if (maxv > 255)
7619       return 16;
7620    else
7621       return 8;
7622 }
7623
7624 static int stbi__pnm_is16(stbi__context *s)
7625 {
7626    if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
7627            return 1;
7628    return 0;
7629 }
7630 #endif
7631
7632 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
7633 {
7634    #ifndef STBI_NO_JPEG
7635    if (stbi__jpeg_info(s, x, y, comp)) return 1;
7636    #endif
7637
7638    #ifndef STBI_NO_PNG
7639    if (stbi__png_info(s, x, y, comp))  return 1;
7640    #endif
7641
7642    #ifndef STBI_NO_GIF
7643    if (stbi__gif_info(s, x, y, comp))  return 1;
7644    #endif
7645
7646    #ifndef STBI_NO_BMP
7647    if (stbi__bmp_info(s, x, y, comp))  return 1;
7648    #endif
7649
7650    #ifndef STBI_NO_PSD
7651    if (stbi__psd_info(s, x, y, comp))  return 1;
7652    #endif
7653
7654    #ifndef STBI_NO_PIC
7655    if (stbi__pic_info(s, x, y, comp))  return 1;
7656    #endif
7657
7658    #ifndef STBI_NO_PNM
7659    if (stbi__pnm_info(s, x, y, comp))  return 1;
7660    #endif
7661
7662    #ifndef STBI_NO_HDR
7663    if (stbi__hdr_info(s, x, y, comp))  return 1;
7664    #endif
7665
7666    // test tga last because it's a crappy test!
7667    #ifndef STBI_NO_TGA
7668    if (stbi__tga_info(s, x, y, comp))
7669        return 1;
7670    #endif
7671    return stbi__err("unknown image type", "Image not of any known type, or corrupt");
7672 }
7673
7674 static int stbi__is_16_main(stbi__context *s)
7675 {
7676    #ifndef STBI_NO_PNG
7677    if (stbi__png_is16(s))  return 1;
7678    #endif
7679
7680    #ifndef STBI_NO_PSD
7681    if (stbi__psd_is16(s))  return 1;
7682    #endif
7683
7684    #ifndef STBI_NO_PNM
7685    if (stbi__pnm_is16(s))  return 1;
7686    #endif
7687    return 0;
7688 }
7689
7690 #ifndef STBI_NO_STDIO
7691 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
7692 {
7693     FILE *f = stbi__fopen(filename, "rb");
7694     int result;
7695     if (!f) return stbi__err("can't fopen", "Unable to open file");
7696     result = stbi_info_from_file(f, x, y, comp);
7697     fclose(f);
7698     return result;
7699 }
7700
7701 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
7702 {
7703    int r;
7704    stbi__context s;
7705    long pos = ftell(f);
7706    stbi__start_file(&s, f);
7707    r = stbi__info_main(&s,x,y,comp);
7708    fseek(f,pos,SEEK_SET);
7709    return r;
7710 }
7711
7712 STBIDEF int stbi_is_16_bit(char const *filename)
7713 {
7714     FILE *f = stbi__fopen(filename, "rb");
7715     int result;
7716     if (!f) return stbi__err("can't fopen", "Unable to open file");
7717     result = stbi_is_16_bit_from_file(f);
7718     fclose(f);
7719     return result;
7720 }
7721
7722 STBIDEF int stbi_is_16_bit_from_file(FILE *f)
7723 {
7724    int r;
7725    stbi__context s;
7726    long pos = ftell(f);
7727    stbi__start_file(&s, f);
7728    r = stbi__is_16_main(&s);
7729    fseek(f,pos,SEEK_SET);
7730    return r;
7731 }
7732 #endif // !STBI_NO_STDIO
7733
7734 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
7735 {
7736    stbi__context s;
7737    stbi__start_mem(&s,buffer,len);
7738    return stbi__info_main(&s,x,y,comp);
7739 }
7740
7741 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
7742 {
7743    stbi__context s;
7744    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7745    return stbi__info_main(&s,x,y,comp);
7746 }
7747
7748 STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
7749 {
7750    stbi__context s;
7751    stbi__start_mem(&s,buffer,len);
7752    return stbi__is_16_main(&s);
7753 }
7754
7755 STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
7756 {
7757    stbi__context s;
7758    stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
7759    return stbi__is_16_main(&s);
7760 }
7761
7762 #endif // STB_IMAGE_IMPLEMENTATION
7763
7764 /*
7765    revision history:
7766       2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
7767       2.19  (2018-02-11) fix warning
7768       2.18  (2018-01-30) fix warnings
7769       2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
7770                          1-bit BMP
7771                          *_is_16_bit api
7772                          avoid warnings
7773       2.16  (2017-07-23) all functions have 16-bit variants;
7774                          STBI_NO_STDIO works again;
7775                          compilation fixes;
7776                          fix rounding in unpremultiply;
7777                          optimize vertical flip;
7778                          disable raw_len validation;
7779                          documentation fixes
7780       2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
7781                          warning fixes; disable run-time SSE detection on gcc;
7782                          uniform handling of optional "return" values;
7783                          thread-safe initialization of zlib tables
7784       2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
7785       2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
7786       2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
7787       2.11  (2016-04-02) allocate large structures on the stack
7788                          remove white matting for transparent PSD
7789                          fix reported channel count for PNG & BMP
7790                          re-enable SSE2 in non-gcc 64-bit
7791                          support RGB-formatted JPEG
7792                          read 16-bit PNGs (only as 8-bit)
7793       2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
7794       2.09  (2016-01-16) allow comments in PNM files
7795                          16-bit-per-pixel TGA (not bit-per-component)
7796                          info() for TGA could break due to .hdr handling
7797                          info() for BMP to shares code instead of sloppy parse
7798                          can use STBI_REALLOC_SIZED if allocator doesn't support realloc
7799                          code cleanup
7800       2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
7801       2.07  (2015-09-13) fix compiler warnings
7802                          partial animated GIF support
7803                          limited 16-bpc PSD support
7804                          #ifdef unused functions
7805                          bug with < 92 byte PIC,PNM,HDR,TGA
7806       2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
7807       2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
7808       2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
7809       2.03  (2015-04-12) extra corruption checking (mmozeiko)
7810                          stbi_set_flip_vertically_on_load (nguillemot)
7811                          fix NEON support; fix mingw support
7812       2.02  (2015-01-19) fix incorrect assert, fix warning
7813       2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
7814       2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
7815       2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
7816                          progressive JPEG (stb)
7817                          PGM/PPM support (Ken Miller)
7818                          STBI_MALLOC,STBI_REALLOC,STBI_FREE
7819                          GIF bugfix -- seemingly never worked
7820                          STBI_NO_*, STBI_ONLY_*
7821       1.48  (2014-12-14) fix incorrectly-named assert()
7822       1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
7823                          optimize PNG (ryg)
7824                          fix bug in interlaced PNG with user-specified channel count (stb)
7825       1.46  (2014-08-26)
7826               fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
7827       1.45  (2014-08-16)
7828               fix MSVC-ARM internal compiler error by wrapping malloc
7829       1.44  (2014-08-07)
7830               various warning fixes from Ronny Chevalier
7831       1.43  (2014-07-15)
7832               fix MSVC-only compiler problem in code changed in 1.42
7833       1.42  (2014-07-09)
7834               don't define _CRT_SECURE_NO_WARNINGS (affects user code)
7835               fixes to stbi__cleanup_jpeg path
7836               added STBI_ASSERT to avoid requiring assert.h
7837       1.41  (2014-06-25)
7838               fix search&replace from 1.36 that messed up comments/error messages
7839       1.40  (2014-06-22)
7840               fix gcc struct-initialization warning
7841       1.39  (2014-06-15)
7842               fix to TGA optimization when req_comp != number of components in TGA;
7843               fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7844               add support for BMP version 5 (more ignored fields)
7845       1.38  (2014-06-06)
7846               suppress MSVC warnings on integer casts truncating values
7847               fix accidental rename of 'skip' field of I/O
7848       1.37  (2014-06-04)
7849               remove duplicate typedef
7850       1.36  (2014-06-03)
7851               convert to header file single-file library
7852               if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7853       1.35  (2014-05-27)
7854               various warnings
7855               fix broken STBI_SIMD path
7856               fix bug where stbi_load_from_file no longer left file pointer in correct place
7857               fix broken non-easy path for 32-bit BMP (possibly never used)
7858               TGA optimization by Arseny Kapoulkine
7859       1.34  (unknown)
7860               use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7861       1.33  (2011-07-14)
7862               make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7863       1.32  (2011-07-13)
7864               support for "info" function for all supported filetypes (SpartanJ)
7865       1.31  (2011-06-20)
7866               a few more leak fixes, bug in PNG handling (SpartanJ)
7867       1.30  (2011-06-11)
7868               added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7869               removed deprecated format-specific test/load functions
7870               removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7871               error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7872               fix inefficiency in decoding 32-bit BMP (David Woo)
7873       1.29  (2010-08-16)
7874               various warning fixes from Aurelien Pocheville
7875       1.28  (2010-08-01)
7876               fix bug in GIF palette transparency (SpartanJ)
7877       1.27  (2010-08-01)
7878               cast-to-stbi_uc to fix warnings
7879       1.26  (2010-07-24)
7880               fix bug in file buffering for PNG reported by SpartanJ
7881       1.25  (2010-07-17)
7882               refix trans_data warning (Won Chun)
7883       1.24  (2010-07-12)
7884               perf improvements reading from files on platforms with lock-heavy fgetc()
7885               minor perf improvements for jpeg
7886               deprecated type-specific functions so we'll get feedback if they're needed
7887               attempt to fix trans_data warning (Won Chun)
7888       1.23    fixed bug in iPhone support
7889       1.22  (2010-07-10)
7890               removed image *writing* support
7891               stbi_info support from Jetro Lauha
7892               GIF support from Jean-Marc Lienher
7893               iPhone PNG-extensions from James Brown
7894               warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7895       1.21    fix use of 'stbi_uc' in header (reported by jon blow)
7896       1.20    added support for Softimage PIC, by Tom Seddon
7897       1.19    bug in interlaced PNG corruption check (found by ryg)
7898       1.18  (2008-08-02)
7899               fix a threading bug (local mutable static)
7900       1.17    support interlaced PNG
7901       1.16    major bugfix - stbi__convert_format converted one too many pixels
7902       1.15    initialize some fields for thread safety
7903       1.14    fix threadsafe conversion bug
7904               header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7905       1.13    threadsafe
7906       1.12    const qualifiers in the API
7907       1.11    Support installable IDCT, colorspace conversion routines
7908       1.10    Fixes for 64-bit (don't use "unsigned long")
7909               optimized upsampling by Fabian "ryg" Giesen
7910       1.09    Fix format-conversion for PSD code (bad global variables!)
7911       1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7912       1.07    attempt to fix C++ warning/errors again
7913       1.06    attempt to fix C++ warning/errors again
7914       1.05    fix TGA loading to return correct *comp and use good luminance calc
7915       1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
7916       1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7917       1.02    support for (subset of) HDR files, float interface for preferred access to them
7918       1.01    fix bug: possible bug in handling right-side up bmps... not sure
7919               fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7920       1.00    interface to zlib that skips zlib header
7921       0.99    correct handling of alpha in palette
7922       0.98    TGA loader by lonesock; dynamically add loaders (untested)
7923       0.97    jpeg errors on too large a file; also catch another malloc failure
7924       0.96    fix detection of invalid v value - particleman@mollyrocket forum
7925       0.95    during header scan, seek to markers in case of padding
7926       0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7927       0.93    handle jpegtran output; verbose errors
7928       0.92    read 4,8,16,24,32-bit BMP files of several formats
7929       0.91    output 24-bit Windows 3.0 BMP files
7930       0.90    fix a few more warnings; bump version number to approach 1.0
7931       0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
7932       0.60    fix compiling as c++
7933       0.59    fix warnings: merge Dave Moore's -Wall fixes
7934       0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
7935       0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7936       0.56    fix bug: zlib uncompressed mode len vs. nlen
7937       0.55    fix bug: restart_interval not initialized to 0
7938       0.54    allow NULL for 'int *comp'
7939       0.53    fix bug in png 3->4; speedup png decoding
7940       0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7941       0.51    obey req_comp requests, 1-component jpegs return as 1-component,
7942               on 'test' only check type, not whether we support this variant
7943       0.50  (2006-11-19)
7944               first released version
7945 */
7946
7947
7948 /*
7949 ------------------------------------------------------------------------------
7950 This software is available under 2 licenses -- choose whichever you prefer.
7951 ------------------------------------------------------------------------------
7952 ALTERNATIVE A - MIT License
7953 Copyright (c) 2017 Sean Barrett
7954 Permission is hereby granted, free of charge, to any person obtaining a copy of
7955 this software and associated documentation files (the "Software"), to deal in
7956 the Software without restriction, including without limitation the rights to
7957 use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7958 of the Software, and to permit persons to whom the Software is furnished to do
7959 so, subject to the following conditions:
7960 The above copyright notice and this permission notice shall be included in all
7961 copies or substantial portions of the Software.
7962 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7963 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7964 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7965 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
7966 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
7967 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
7968 SOFTWARE.
7969 ------------------------------------------------------------------------------
7970 ALTERNATIVE B - Public Domain (www.unlicense.org)
7971 This is free and unencumbered software released into the public domain.
7972 Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
7973 software, either in source code form or as a compiled binary, for any purpose,
7974 commercial or non-commercial, and by any means.
7975 In jurisdictions that recognize copyright laws, the author or authors of this
7976 software dedicate any and all copyright interest in the software to the public
7977 domain. We make this dedication for the benefit of the public at large and to
7978 the detriment of our heirs and successors. We intend this dedication to be an
7979 overt act of relinquishment in perpetuity of all present and future rights to
7980 this software under copyright law.
7981 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
7982 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
7983 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
7984 AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
7985 ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
7986 WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7987 ------------------------------------------------------------------------------
7988 */