[vsnet-chat 7572] mined data - many LPVs : the intro

substellar at Safe-mail.net substellar at Safe-mail.net
Mon Jul 26 02:16:03 JST 2010


You here a lot about datamining.  In practice it's more like panning, sifting through sludge and grit as it flows down the datastream in the hope of getting a nugget of gold, small returns compared to hitting a rich seam.  And although sometimes something of some small worth may crop up, it's not necessarily gold, nor any indication of another piece being likely to turn up anytime soon thereafter.  But there's a lot of dross in the datastream, rather than rich metalliferous ores with thick seams, the usual case is silty dross with the rare nuggets.

Observers aren't usually too bothered about what exists in archived data, so a useful example of a good paper is better based on panning observational data, rather than archival data, and one very good recent paper was this one http://arxiv.org/abs/1007.2684 .  Folded lightcurves of periodic objects are presented for the reader to check the results, of the 1318 new variables one is declared known, and on double checking their list I couldn't even find that one.  The text states that the "record breaking" shortest period eclipsing binaries could well be reddened short period pulsators, as at low amplitudes these are two types difficult to distinguish.  This is almost unique.  Most papers provide immense electronic lists of innumerable objects, often only candidates, a vast number of them being constant, many of the true variable stars being already known despite claims of newness because only SIMBAD was used to check for known variable stars, no phaseplots given to check the validity of at times wrong periods when the things are variable, and massively overblown estimates of the percentage of field stars that are detectable variable stars.  Even the recent Kepler eclipser paper, which goes down to millimagnitude amplitudes, only detected 1.2% of field objects variable, possibly extending to 1.5 to 2% if short period pulsators and long period stars aren included, and decreasing dramatically if +/- 0.1 magnitudes is used as a cut off.  This applies to professional and amateur papers alike.

Now, if someone wants to datamine, there's an easy way.  Sometimes it's said that all red stars are variable.  This isn't quite true.  Probably all stars are variable if you have a magic telescope that can see variations down to micromagnitudes, so we'll read "variable" as "detectably variable from normal ground based epoch photometry" and set the threshold at about 0.1 magnitudes ampltiude.  Then we can say all red giant stars are probably variable at that level.  And quite a lot of them are, although it's still not entirely true, it is far harder to find the constant ones.

Now, all apparently optically bright stars usually appear in allsky surveys at other passbands (except radio and gamma rays).  Naked eye stars are either very close or intrinsically very luminous, so they will crop up in UV and IR and xray object lists.  We need a cut off then.  Cut off can be anywhere, and can be different for different data, but if we've say got epoch photometry in the I band from mag 12 to 18 or so, then the red stars will crop up in that range mostly not of the nearby or intrinsically luminous variety (though some of those will also crop up).  Some will be overexposed, being red, and it being an I band, and coincidentally there will be some nearby and/or intrinsically luminous objects.  If it's a small field, relatively, these are reduced in number.

OGLE II data is some of the most precise photometric data around.  Variations from a land based telescope which at some magnitude ranges allows a periodic or even nonperiodic variable of as little as 0.05 I amplitude to be noted and recognised above the noise level.  It's also some of the easiest presented and usable data, you can readily get a column of JD and mag from the photometry server.

So when someone asked me recently if I'd done anything with the AKARI data, I remembered I had, some time ago.  I crossmatched it with the OGLE II Disc datafiles I bled out of that system some time ago.  Now, I tend not to let stuff out publicly no more, as it tends to get appropriated, indifferent and unexciting as it is, and what's more the appropriators tend to claim no provenance knowledge of where their methodologies came from, and indeed at times even say I did it all wrong despite exact mirroring of the rather unimaginative and basic methodologies that I have used.

But if you want to datamine, and get freeby easy red variables, even if they're of the ilk of only being labelled as irregulars, with a very high success rate, and very little dross and even fewer false alarms, well, this http://ogledb.astrouw.edu.pl/~ogle/photdb/phot_query.html allows you to generate large lists of OGLE objects which can be matched against AKARI IRC using VizieR list investigation.  OGLE II is about an arcsec or two off sometimes, though often spot on astrometrically, and AKARI IRC is nominally about two arcsecs off, most of the time, but can be worse, and can be spot on (as can happen with random distribution).

You choose your search radius accordingly.  Too small, you miss stuff, too large, the number of false alarms are significant enough to markedly effect the success rate.

And then of course there is the matter of that already known and published.  OGLE teams themselves have published innumerable red variables from the LMC and SMC, even including those from the more recent OGLE III effort.  Not all of these are in VizieR.  The Galactic Bulge survey has a publication of 15,000 red variables, not in AAVSO VSX, last I looked, but possibly in VizieR, yep, Wray et al 
http://cdsarc.u-strasbg.fr/viz-bin/Cat?J/MNRAS/349/1059 . There are two separate lists of 3000 plus Miras from the Bulge, one by a European team, one by a Japanese team.  And there are Miras in each that are unique to each list.  That is, they did not recover each other's Miras entirely, so the case is possibly not complete for Miras.  However, most of them will have been found.  And there are other lists too.

But for Galactic Disc, well, a 100 or so Miras have been published, and some tens more added onto various online databases, but still not the complete 200 or so someone estimated.  But next to no semiregular variables have been logged, just a handful, and few if any SRS type (low amplitude "short" period red variables), and not many Lb.

The list below has a very high success rate for these.  There are even about 20 as yet unsolved and unlisted Miras, maybe more, I haven't checked them all.  Many SR, some SRS, and possibly some exotica or two, maybe a hidden YSO or RCB.

There will be few false alarms, but there may be mismatches.  Brighter OGLE II objecst when plotted up graphically will show 'haloes' of fianter artefact objects surrounding them, sometimes displaying the variability of the main source.  This can be especially confusing in the cases where the actual variable is not in the OGLE II data as it was too bright and overexposed, yet the ghost artefacts are listed there.  Also, sometimes, as with many surveys, adjacent field stars can mimic in their lightcurves variability displayed by an adjacent field star, or can look variable instead of constant because of noise from a bright ajdacent field star.  You can regularly see people making this mistake.  Usually solved by the lightcurve being far noisier than usual, or by a suite of these objects existing, only one of which is the real one (or as said the real one is not listed due to having saturated the image).

But other than that, cut and paste the below into a text editor, save it as ao.html, load that into your browser, and peruse the lightcurves one by one to see the rare case of _datamined_ variable stars, ie near infrared sources matched to epoch photometry predominantly producing stellar object with varying epoch photometry.  Many of the Miras and some very few of the semiregulars will already be listed somewhere, but most of the semiregulars and SRS won't be, and indeed I expect them to suddenly appear in an unprovenanced sort of way soon enough, probably in an excel spreadsheet notified on obscure maillists.  Some other variability types, besides red excess objects and other red objcets (eg YSOs and RCBs), may also crop up occasionally, and it will be difficult to be sure what some of the irregular low amplitude objects are, despite Lb being a safe-sh guess.  And every now and again a constant one does crop up.

But the point is that anyone can roll their own just using the above OGLE II photometric database and the VizieR service list coordinate import check against the AKARI IRC catalogue (STARID column in will give the OGLE II positions, and regions can be downloaded in ascii text, but the coords will have to be edited to the input format of hh mm ss.s dd '' "" (for sexagesimal) that vizier requires as coordinate list input.  Save results to tab separated format, use a none Strasbourg mirror (they're usually faster) and don't overdue the search radius to cut down on false alarms, but note that mismatches are not impossible, ie bad matching of AKARI to OGLE II objects, whereas false alarms are objects that are in fact not variable but can still be crossmatched correctly, and are very much reduced in amount to being far less than the number of true variable stars in this particular datamining exercise.  There are lots and lots of variables here, real ones, not candidates and hoped fors, but they're mostly not the particularly exciting kind.

Hopefully the paste won't wordwrap, but if it does, each line begins <a href= and ends <br>, sometimes <br><br> to break the list up into groups of tens.

Here are some selected highlight ones out of the nearly 800 to play with :-

http://ogledb.astrouw.edu.pl/~ogle/photdb/getobj.php?field=cen_sc1&starid=11376&db=DIA an unlisted borderline Mira/SRA

http://ogledb.astrouw.edu.pl/~ogle/photdb/getobj.php?field=cen_sc1&starid=27602&db=DIA an unlisted Mira

http://ogledb.astrouw.edu.pl/~ogle/photdb/getobj.php?field=cen_sc1&starid=87980&db=DIA some sort of mostly steadily fading thing, J-Ks a large 2.76, not visible n DSS2 red or blue, but is on DSS2 I.  RCB or postAGB in general????  Exotical

http://ogledb.astrouw.edu.pl/~ogle/photdb/getobj.php?field=cen_sc1&starid=87962&db=DIA a good demonstration of the quality OGLE II phomtometry can have, this 0.1 amplitude plot includes error bars

http://ogledb.astrouw.edu.pl/~ogle/photdb/getobj.php?field=cen_sc1&starid=78363&db=DIA ???

http://ogledb.astrouw.edu.pl/~ogle/photdb/getobj.php?field=cen_sc1&starid=77180&db=DIA another unlisted Mira

and these are just highlights (there were many more variable stars in that field not mentioned) from the first Centaurus field, of which there are 4, and there are 8 Norma and 8 Scorpius.  None of the 4 Carina fields it seems, which is strange, but I did this when the first raw akari catalogue came out, so I can't remember why not.

Miras from assured success datamining routines are good ways for beginners to learn researching accurate positions, period analysis (periods don't have to be to any decimal places for example, and usually obvious enough, even if the phaseplot of Miras is never perfect due to their inconstant periods), and searching online resources to see if the object is known or not, as well as being easy to recognise for what they are.

Due to an earlier post of this not getting there, possibly because it was the problem of my offering it something bigger than it could cope with, I have cut this into three, this the intro, and the list in two halves.

Cheers

John


More information about the vsnet-chat mailing list